DIFFERENTIABILITY OF M-FUNCTIONALS OF LOCATION 
AND SCATTER BASED ON T LIKELIHOODS 



By R. M. Dudley,* Sergiy Sidenko/ and Zuoqin Wang* 

Abstract. The paper aims at finding widely and smoothly defined 
nonparametric location and scatter functionals. As a convenient 
vehicle, maximum likelihood estimation of the location vector /i 
and scatter matrix E of an elliptically symmetric t distribution 
on M.'^ with degrees of freedom u > 1 extends to an M-functional 
defined on all probability distributions P in a weakly open, weakly 
dense domain U. Here U consists of P putting not too much 
mass in hyperplanes of dimension < d, as shown for empirical 
measures by Kent and Tyler [Ann. Statist. 1991). It is shown 
here that [fi, S) is analytic on U, for the bounded Lipschitz norm, 
or for d = 1, for the sup norm on distribution functions. For 
k = 1,2, and other norms, depending on k and more directly 
adapted to t functionals, one has continuous differentiability of 
order k, allowing the delta-method to be applied to {fi, S) for any 
P in U, which can be arbitrarily heavy-tailed. These results imply 
asymptotic normality of the corresponding M-estimators (yU„, 
In dimension d = 1 only, the functional {fi, cr) extends to be 
defined and weakly continuous at all P. 

1. Introduction 

This paper is a longer version, with proofs, of the paper Dudley, Sidenko and 
Wang (2009). It aims at developing some nonparametric location and scatter 
functionals, defined and smooth on large (weakly dense and open) sets of distri- 
butions. The nonparametric view is much as in the work of Bickel and Lehmann 
(1975) (but not adopting, e.g., their monotonicity axiom) and to a somewhat 
lesser extent, that of Davies (1998). Although there are relations to robustness, 
that is not the main aim here: there is no focus on neighborhoods of model distri- 
butions with densities such as the normal. It happens that the parametric family 
of ellipsoidally symmetric t densities provides an avenue toward nonparametric 
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location and scatter functional, somewhat as maximum likelihood estimation 
of location for the double-exponential distribution in one dimension gives the 
median, generally viewed as a nonparametric functional. 

Given observations Xi, in R'^ let P„ := ^Yl^=i^Xj- Given P„, and the 

location-scatter family of elliptically symmetric t^, distributions on R'^ with z/ > 1, 
maximum likelihood estimates of the location vector /i and scatter matrix S exist 
and are unique for "most" P„. Namely, it suffices that Pn{J) < {^ + (l)/i^ + d) for 
each affine hyperplane J of dimension q < d, as shown by Kent and Tyler (1991). 
The estimates extend to M-functionals defined at all probability measures P on 
R'^ satisfying the same condition; that is shown for integer z/ and in the sense 
of unique critical points by Diimbgen and Tyler (2005) and for any u > and 
M-functionals in the sense of unique absolute minima in Theorem [3], in light 
of Theorem [6]^a), for pure scatter and then in Theorem M^e) for location and 
scatter with u > 1. A method of reducing location and scatter functionals in 
dimension d to pure scatter functionals in dimension d + 1 was shown to work 
for t distributions by Kent and Tyler (1991) and only for such distributions by 
Kent, Tyler and Vardi (1994), as will be recalled after Theorem [HI 

So the t functionals are defined on a weakly open and weakly dense domain, 
whose complement is thus weakly nowhere dense. One of the main results of the 
present paper gives analyticity (defined in the Appendix) of the functionals on 
this domain, with respect to the bounded Lipschitz norm (Theorem [9](d)). An 
adaptation gives differentiability of any given finite order k with respect to norms, 
depending on k, chosen to give asymptotic normality of the t location and scatter 
functionals (Theorem [T5]) for arbitrarily heavy-tailed P (for such P, the central 
limit fails in the bounded Lipschitz norm). In turn, this yields delta- method 
conclusions (Theorem I^UT b)). uniformly over suitable families of distributions 
(Proposition [2^ : these statements don't include any norms, although their proofs 
do. It follows in Corollary [2l] that continuous Frechet differentiability of the ti, 
location and scatter functionals of order k also holds with respect to affinely 
invariant norms defined via suprema over positivity sets of polynomials of degree 
at most 2k + 4. 

For the delta-method, one needs at least differentiability of first order. To get 
first derivatives with respect to probability measures P via an implicit function 
theorem we use second order derivatives with respect to matrices. Moreover, 
second order derivatives with respect to P (or in the classical case, with respect 
to an unknown parameter) can improve the accuracy of the delta-method and 
the speed of convergence of approximations. It turns out that derivatives of 
arbitrarily high order are obtainable with little additional difficulty. 

For norms in which the central limit theorem for empirical measures holds for 
all probability measures, such as those just mentioned, bootstrap central limit 
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theorems also hold [Gine and Zinn (1990)], which then via the delta-method can 
give bootstrap confidence sets for the t location and scatter functionals. 

In dimension d = 1, the domain on which differentiability is proved is the class 
of distributions having no atom of size + 1) or larger. On this domain, 
analyticity is proved, in Theorem [9]^e), with respect to the usual supremum norm 
for distribution functions. Only for d = 1, it turns out to be possible to extend 
the location and scatter (scale) functionals to be defined and weakly continuous 
at arbitrary distributions (Theorem [23]) . 

For general d > 1 and u = 1 (multivariate Cauchy distributions), not 
covered by the present paper, Diimbgen (1998, §6) briefiy treats location and 
scatter functionals and their asymptotic properties. 

Weak continuity on a dense open set implies that for distributions in that set, 
estimators (functionals of empirical measures) eventually exist almost surely and 
converge to the functional of the distribution. Weak continuity, where it holds, 
also is a robustness property in itself and implies a strictly positive (not necessar- 
ily large) breakdown point. The t^, functionals, as redescending M-functionals, 
downweight outliers. Among such M-functionals, only the ti, functionals are 
known to be uniquely defined on a satisfactorily large domain. The t^, estimators 
are \/n-consistent estimators of functionals where each t,y location functional, 
at any distribution in its domain and symmetric around a point, (by equivariance) 
equals the center of symmetry. 

It seems that few other known location and scatter functionals exist and are 
unique and continuous, let alone differentiable, on a dense open domain. For 
example, the median is discontinuous on a dense set. Smoothly trimmed means 
and variances are defined and differentiable at all distributions in one dimension, 
e.g. Boos (1979) for means. In higher dimensions there are analogues of trimming, 
called peeling or depth weighting, e.g. the work of Zuo and Cui (2005). Location- 
scatter functionals differentiable on a dense domain apparently have not been 
found by depth weighting thus far (in dimension d > 1). 

The t location and scatter functionals, on their domain, can be effectively com- 
puted via EM algorithms [cf. Kent, Tyler and Vardi (1994, §4); Arslan, Constable, 
and Kent (1995); Liu, Rubin and Wu (1998)]. 

2. Definitions and preliminaries 

In this paper the sample space will be a finite-dimensional Euclidean space R'^ 
with its usual topological and Borel structure. A law will mean a probability 
measure on R'^. Let Sd be the collection of aA\ d x d symmetric real matrices, 
N'd the subset of nonnegative definite symmetric matrices and Vd C Afd the 
further subset of strictly positive definite symmetric matrices. The parameter 
spaces 6 considered will be Vd, Md (pure scatter matrices), E'^ x Vd, or E'^ x Afd- 
For (/i, S) G M'^ X A/rf, /X will be viewed as a location parameter and S as a 
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scatter parameter, extending the notions of mean vector and covariance matrix 
to arbitrarily heavy-tailed distributions. Matrices in Md but not in Vd will only 
be considered in one dimension, in Section [9l where the scale parameter a > 
corresponds to o"^ G A/i. 

Notions of "location" and "scale" or multidimensional "scatter" functional will 
be defined in terms of equivariance, as follows. 

Definitions. Let Q ^— > n,{Q) G M"^, resp. G Ad, be a functional defined on a 

set V of laws Q on R*^. Then /x (resp. S) is called an affinely equivariant location 
(resp. scatter) functional iff for any nonsingular d x d matrix A and v G M*^, with 
f{x) := Ax + V, and any law Q & V, the image measure P := Q o f^^ G V 
also, with /i(P) = A/i(Q) + or, respectively, S(P) = AS(Q)A'. For d = 1, 
cr(-) with < cr < cxD will be called an affinely equivariant scale functional iff 
cr^ satisfies the definition of affinely equivariant scatter functional. If we have 
affinely equivariant location and scatter functionals and S on the same domain 
V then (/i, S) will be called an affinely equivariant location-scatter functional on 
V. 

To define M-functionals, suppose we have a function (x, 9) ^ p{x, 6) defined 
for a; G M'^ and 6* G O, Borel measurable in x and lower semicontinuous in 9, i.e. 
p(x, 6) < liminf^^e p(x, 0) for all 6. For a law Q, let Qp{(f)) := / p(x, (j))dQ{x) if 
the integral is defined (not oo — cxd), as it always will be if Q = -Pn- An M- estimate 
of ^ for a given n and -P„ will be a 6'„ such that Pnp{0) is minimized at ^ = ^n, if 
it exists and is unique. A measurable function, not necessarily defined a.s., whose 
values are M-estimates is called an M-estimator. 

For a law P on M*^ and a given p(-, ■), a 6'i = 6i{P) is called the M-functionaloi P 
for p if and only if there exists a measurable function a{x), called an adjustment 
function, such that for h{x,9) = p{x,9) — a{x), Ph{0) is defined and satisfies 
— cxo < Ph{6) < +00 for all 6* G 0, and is minimized uniquely at 6* = 9i{P), 
e.g. Huber (1967). As Huber showed, Oi{P) doesn't depend on the choice of a(-), 
which can moreover be taken as a{x) = p{x, 62) for a suitable 62. 

The following definition will be used for d = 1. Suppose we have a parameter 
space 9, specifically Vd or Vd x R'^, which has a closure 9, specifically Afd or 
J\fd X M'^ respectively. The boundary of 9 is then 9 \ 9. The functions p and h 
are not necessarily defined for 6 in the boundary, but M-functionals may have 
values anywhere in 9 according to the following. 

Definition. A = ^o(-P) ^ 9 will be called the (extended) M-functional of P 
for p or /i if and only if for every neighborhood U of 60, 
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The above definition extends that of M-functional given by Huber (1967) in 
that if 00 is on the boundary of © then h{x, 6o) is not defined, Ph{9o) is defined 
only in a Um inf sense, and at (but only there), the lim inf may be —oo. 

From the definition, an M-functional, if it exists, must be unique. If P is an 
empirical measure P„, then the M-functional 6n '■= ^o(-Pra)) if it exists, is the 
maximum likelihood estimate of 6, in a lim sup sense if 6n is on the boundary. 
Clearly, an M-estimate 9n is the M-functional ^i(P„) if either exists. 

For a differentiable function /, recall that a critical point of / is a point where 
the gradient of / is 0. For example, on let f{x,y) = x'^{l + yY + y'^. Then 
/ has a unique critical point (0,0), which is a strict relative minimum where 
the Hessian (matrix of second partial derivatives) is (q 2)) but not an absolute 
minimum since /(l,y) — > —00 as y — > —00. This example appeared in Durfee, 
Kronenfeld, Munson, Roy, and Westby (1993). 

3. Multivariate scatter 

This section will treat the pure scatter problem in R'^, with parameter space 
© = Vd- The results here are extensions of those of Kent and Tyler (1991, The- 
orems 2.1 and 2.2), on unique maximum likelihood estimates for finite samples, 
to the case of M-functionals for general laws on R*^. 

For A GVd and a function p from [0, 00) into itself, consider the function 

(2) L{y,A) := ^logdet A + p{y' A'' y), y e M<^. 
For adjustment, let 

(3) h{y,A) L{y, A) - L{y, I) 
where / is the identity matrix. Then 

(4) Qh{A) = ilogdet^ + J p{y'A-'y)-p{y'y)dQ{y) 

if the integral is defined. 

As a referee suggested, one can differentiate functions of matrices in a coor- 
dinate free way, as follows. The rf^-dimensional vector space of all d x d real 
matrices becomes a Hilbert space (Euclidean space) under the inner product 
{A,B) :— trace(A'i?). It's easy to verify that this is indeed an inner product 
and is invariant under orthogonal changes of coordinates in the underlying d- 
dimensional vector space. The corresponding norm \\A\\f '■= {A, A)^^"^ is called 
the Frobenius norm. Here is simply the sum of squares of all elements of A, 

and II • Hi;' is the specialization of the (Hilbert)-Schmidt norm for Hilbert-Schmidt 
operators on a general Hilbert space to the case of (all) hnear operators on a 
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finite-dimensional Hilbert space. Let || ■ || be the usual matrix or operator norm, 
||y4|| := sup|^|^;^ \Ax\. Then 

(5) \\A\\ < \\A\\f < Vd\\A\\, 

with equality in the latter for A = I and the former when A = diag(l, 0, . . . , 0). 
In statements such as \\A\\ — ^ or expressions such as 0(||A||) the particular 
norm doesn't matter for fixed d. 

The map A \—>- A~^ is C°° from Vd onto itself. For fixed A E Vd and as 
II A II —>■ 0, we have 

(6) (A + A)-^ = A'^ -A"^AA-^ + 0{\\Af), 

as is seen since {A + A){A'^ - A'^AA~^) = I + 0(|| Ap), then multiplying by 
{A + A)-\ 

Differentiating f{A) for A E Sd is preferably done when possible in coordinate 
free form, or if in coordinates, when restricted to a subspace of matrices all 
diagonal in some fixed coordinates, or at least approaching such matrices. It 
turns out that all proofs in the paper can be and have been done in one of these 
ways. 

We have the following, stated for Q = Qn an empirical measure in Kent and 
Tyler (1991, (1.3)). Here ([7]) is a redescending condition. 

Proposition 1. Let p : [0, cxd) [0, oo) be continuous and have a bounded 
continuous derivative on [0, oo), where 

p'(0) := p'(0+) := lim[p(x)-p(0)]/x. 

2;J.O 

Let < u{x) := 2p'(x) for x > and suppose that 

(7) sup xu{x) < oo. 

0<a;<oo 

Then for any law Q on W^, Qh in ^ is a well-defined and function of A E Vd, 
which has a critical point at A = B if and only if 

(8) B = j u{y'B-'y)yy'dQ{y). 

Proof. By the hypotheses, the chain rule, and ([6]) we have for fixed A G as 
||A|| ^ 

p{y'{A + A)-'y)-p{y'A-'y) = piy'[A-' - A-'AA'' + OiWAfM 

= -p'iy'A^'y)y'A-'AA-'y + o(|| A|| \y\). 

Since y'A~^AA~^y = tTSice{A~^yy' A~^ A) , it follows that the gradient V^i with 
respect to A EVd of p{y'A~^y) is given by 

(9) VAPiy'A-'y) = -^uiy'A-'y)A-'yy'A-\ 
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Given AeVd let At := (1 - t)I + tA e Vd ioi < t < 1. Then 
p{y'A ^y) - p{y'y) = J -^Piv'A ^y)dt 

= p'{y'A^'y)tTa.ce(^AT'yy'A^\A - l))dt. 

For a fixed A ^Vd-, the are all in some compact subset of Vd-, so that their 
eigenvalues are bounded and bounded away from 0. From this and boundedness 
of xu{x) for a; > 0, it follows that y ^ p{y' A"^y) — p{y'y) is a bounded continuous 
function of y. We also have: 

(10) For any compact IC C Vd, sup{\h{y, A)\ : yER"^, AeIC} < oo. 

It follows that for an arbitrary law Q on R'^, Qh{A) in (jlj) is defined and fi- 
nite. Also, Qh{A) is continuous in A by dominated convergence and so lower 
semicontinuous. 

For any B E Sd let its ordered eigenvalues be Xi{B) > \2{B) > • ■ ■ > Xd{B). 
We have for fixed A e Vd as A ^ 0, A e Sd, that 

(11) logdet(A+A)-logdetA = tT8ice{A-^A)-\\A-^/^AA-^/^\\l/2+0{\\Af) 
because 

logdet(A + A) -logdetA = logdet{A-^/\A + A)A-^/^) 

d 

= \ogdei{I + A~^'^AA-^l^) = J]log(l + A,(A-i/2AA-i/2)) 

i=l 

d 

= J2 A.(A-i/2aa-i/2) _ A,(A-i/2aa-i/2)2/2 + 0(11 Af) 

i=l 

and (fTT]) follows. By ([9]), and because the gradient there is bounded, derivatives 
can be interchanged with the integral, so we have 

Qh{A+A) = Qh{A) + ^tTace{A~^A)-J p'{y'A-'y)y'A-'AA-'ydQ{y)+o{\\A\\) 

= Qh{A) + i (^A-' - I uiy'A~'y)A-'yy'A-' dQ{y), A^ + o(l| A||). 
It follows that the gradient of the mapping A ^-^ Qh{A) from Vd into R is 

(12) WaQKA) = i (^A-' - J uiy'A-'y)A-'yy'A~' dQiy)^ e Sd, 
which, multiplying by A on the left and right, is zero if and only if 
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This proves the Proposition. □ 

The following extends to any law Q the uniqueness part of Kent and Tyler 
(1991, Theorem 2.2). 

Proposition 2. Under the hypotheses of PropositionUl on p and u{-), if in ad- 
dition u{-) is nonincreasing and s i— >■ su{s) is strictly increasing on [0,oo), then 
for any law Q onR'^, Qh has at most one critical point A &Vd- 

Proof. By Proposition [H suppose that (IHl) holds for B = A and B = D for some 
D A in Vd- By the substitution y = A^^^z we can assume that A = I ^ D. 

Let ti be the largest eigenvalue of D. Suppose that ti > 1. Then for any y 0, 
by the assumed properties of m(-), u{y'D^^y) < u{t^^y'y) < tiu{y'y). It follows 
from (IH]) for D and I that for any z &R'^ with 2; 7^ 0, 

z'Dz = j u{y'DS){zy fdQ{y)<t^ j u{y'y){z'y fdQ{y) = tM\ 

where the last equation implies that Q is not concentrated in any [d — 1)- 
dimensional vector subspace z'y = and so the preceding inequality is strict. 
Taking z as an eigenvector for the eigenvalue ti gives a contradiction. 

Utd < 1 for the smallest eigenvalue td of D we get a symmetrical contradiction. 
It follows that D = I, proving the Proposition. □ 

We saw in the preceding proof that if there is a critical point, Q is not con- 
centrated in any proper linear subspace. More precisely, a sufficient condition 
for existence of a minimum (unique by Proposition [2]) will include the following 
assumption from Kent and Tyler (1991, (2.4)). For a given function u{-) as in 
Proposition [2], let ao := ao{u{-)) := sup^^g ■^'^l'^)- Since s sm(s) is increasing, 
we will have 

(13) su{s)^ao as s t + 00. 

Kent and Tyler (1991) gave the following conditions for empirical measures. 

Definition. For a given number ao := a(0) > let Ud,a{o) be the set of all 
probability measures Q on R'^ such that for every linear subspace H of dimension 
q<d-l, Q{H) <l-{d- q)/ao, so that QiH") > {d - q)/ao. 

If Q G Ud^a(Q)^ then (5({0}) < 1 — (d/ao), which is impossible if ao < d. So 
we will need > d and assume it, e.g. in the following theorem. In the case 
later we will have oq = u + d > d for any u > 0. For a(0) > d, Ud^a{o) is weakly 
open and dense and contains all laws with densities. In part (b), Kent and Tyler 
(1991, Theorems 2.1 and 2.2) proved that there is a unique B{Qn) minimizing 
Qnh for an empirical Q„ G Ud,a{o)- 
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Theorem 3. Let u{-) > be a bounded continuous function on [0, oo) satisfying 

withu{-) nonincreasing and s i—^ su{s) strictly increasing. Then for a{0) = 
as in (123); if clq > d, 

(a) If Q ^ l^d,a{o), then Qh has no critical points. 

(b) If Q & V(d,a{o), then Qh attains its minimum at a unique B = B{Q) G Vd and 
has no other critical points. 

Proof, (a): Tyler (1988, (2.3)) showed that the condition Q{H) < 1 - {d-q)/ao 
for all linear subspaces H of dimension g > is necessary for the existence of 
a critical point as in ([8]) for Q = Qn- His proof shows necessity of the stronger 
condition Qn G Ud,a{o) when su{s) < for all s < oo (then the inequality Tyler 
[1988, (4.2)] is strict) and also applies when g = 0, so that H = {0}. The proof 
extends to general Q, using ([7]) for integrability. 

(b): For any A in Vd, let the eigenvalues of he ti < T2 < ■ ■ ■ < Td, where 
Tj = Tj{A) for each j. Let A be diagonalized. Then, varying A only among 
matrices diagonalized in the same coordinates, by f[T^ . 



dQhjA) ^ J_ 
^ ^ dr, 2r, 

Claim 1: For some 6^ > 0, 

(15) mi{QhiA) : n{A) < So/2} > (log2)/4 + inf{g/i(A) : ri(A) > So}. 

To prove Claim 1, we have xu{x) | as a; | since u(-) is right-continuous at 0, 
and so by dominated convergence using ([7]), there is a 5o > 0, not depending on 
the choice of Euclidean coordinates, such that for any t < Sq, J t\y\'^u(t\y\'^)dQ{y) 
< 1/2. We can take So < 1. Then, since s t— > su{s) is increasing, it follows 
that for each j = l,...,d, if tj < So then Tj J yju{Tjy'^)dQ{y) < 1/2 and so 

Tj J yjU{Y^'^^^Tiyf)dQ{y) < 1/2 since u{-) is nonincreasing. It follows by (fHI) 
that 

(16) dQh{A)/dTj < -l/(4r,), r, < Sq. 

If Ti < So/2, let r be the largest index j < d such that Tj < So- For any 
< Ci < ■ ■ • < Crf Ist A{(i, . . . ,(d) be the diagonal matrix with diagonal entries 
1/Ci, • • • , 1/Cd- Starting at ri, . . . , and letting Q increase from Tj up to So for 
j = r, r — 1, . . . , 1 in that order, we get, specifically at the final step for (i, 

(17) Qh{A{n,...,Td))-Qh{A{So,...,So,Tr+u...,Td)) > (log2)/4. 

So (ITSl) follows, for any small enough So > 0, and Claim 1 is proved. At this stage 
we have not shown that either of the infima in (ITSl) is finite. 

Let Mo ■■= {A eVd : n^A) > So}. Then by iterating for So divided 
by powers of 2, we find that for k = 1,2,..., for any A ^ Vd with Ti{A) < So/2'', 
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there is an A' G Aio with Tj{A') = Tj{A) whenever Tj{A) > 60 and 

(18) Qh{A) > Qh{A')+k{\og2)/A. 
Let 5i := 5o/2 < 1/2. Then by ([IS]), 

(19) mf{Qh{A) : n{A) < 5i} > (log 2)/4 + inf {Q/i(A) : ri{A) > 5i}. 

Next, Claim 2 is that if {Ak} is a sequence in Vd, with Tj^k '■= Tj{Ak) for each 
j and fc, such that T^^k ~^ +00, with ri ^ > for all k, then Qh{Ak) +00. If 
not, then taking subsequences, we can assume the following: 

(i) Td^k T + 00; 

(ii) For some r = 1, . . . , c?, r^^^ — > +00, while for j = 1, . . . , r — 1, Tj^^ is bounded; 

(iii) For each j = r, . . . , d, 1 < Tj^k ] + oo; 

(iv) For each = 1, 2, let {ej^k}'j=i be an orthonormal basis of eigenvectors of 
Ak in M'^ where AkCj^k = Tj,kGj,k- As A; — > cx), for each j = 1, . . . , d, e^^fc converges 
to some Cj. 

Then {cjjj'^i is an orthonormal basis of M'^. Let Sj be the linear span of 
ei, . . . , for j = 1, . . . , (i, 5*0 := {0}, Dj := Sj \ Sj-i for j = 1, . . . ,d and 
Do := {0}. We have by (gD that Q/i(Afc) = J^'^.^i Cj,fc where for j = 1, . . . , 

(20) Cj,k ■■= -^logr,,fc+ f piy'A,'y)-piy'y)dQ{y), 

J Dj 

noting that on Dq, the integrand is 0. So we need to show that Yl'j=i Cj,k — ^ +00. 
If we add and subtract p{6iy'y) in the integrand and note that piy'y) — p{5iy'y) 
is a fixed bounded and thus integrable function, by (ITOl) . letting 

(21) lj,k ■■= -^logr,-fc+ / p{y'A-^y) - p{6iy'y)dQ{y), 

we need to show that X]j=i 7j,fc ~^ +00. Since Tj^k > for all j and k and by 
(ii), 7j^fc are bounded below for j = 1, . . . , r — 1. Because Q G Ud^a{o), there is an 
a with d < a < ao close enough to oq so that for j = r, . . . , d, 

(22) := 1 - lll:l±l - g(5,_i) > 0, 

noting that Sj-i is a linear subspace of dimension j — 1 not depending on k. It 
will be shown that as k ^ 00, 

d 

(23) Tra := — logr^^fc + 7^- a: +00 

for m = r, . . . ,d, which for m = r will imply Claim 2. The relation (!23|) will be 
proved by downward induction from m = d to m = r. 
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For coordinates Uj := e'^y, each e > and j = r, . . . ,d, we have 

(24) rjAe'j,kyy > (1 - 

for k > koj for some koj. Choose e with < e < l — 6i. Let ko := ma.Xr<j<dkoj, 
so that for k > ko, as will be assumed from here on, fl2^ will hold for all j = 
r, . . . ,d. It follows then that since rj.fe > 6i for all i, 

(25) P(y'^fc'y) > p(5iy'y+(l-£-5i)r„fc2/|) 
for j = r, . . . , d. For such j it follows that 

7j,fe > l'j,k ■= -^log^j,fe+ / p('^iZ/'l/ + (l-^-'^i)Tj-,fcyJ) -p('^iZ/'l/)c^<5(y)- 
For j = r, . . . ,d and r > 5i > we have 

< r— [p(5i?/'?/ +(!-£- 5i)ry|) - pidiy'y)] 

= ^{l-s- 5,)y]u{5,y'y + {I - e - 5,)ry]) < |, 

and the quantity bounded above by ao/2 converges to ao/2 as r — >• +oo by 
(fT3|l for all y e L'j since y^ ^ there. Because the derivative is bounded, the 
differentiation can be interchanged with the integral, and we have 

^ = 77^ T,4l-e-6,)[ y^u{6,y'y + {1 - e - 6,)T,^ky])dQ{y) - 

where the quantity in square brackets converges to aoQ{Dj) — 1 as /c — oo and 
so 

H,k/drj,k ~ MiDj) - l]/(2r,-fc). 
Choose ai with a < ai < aQ. It follows that for k large enough 

(26) 7j,k > ^[aig(D,)-l]ln(r,,,), 

with equality if Q{Dj) = and strict inequality otherwise. 

Now beginning the inductive proof of fl23|) for m = d, we have = 1 — a^^ — 
Q(S'd-i) = Q{Dd)-a-^, so {l + aad)/2 = aQ{Dd)/2, and 7^,^- (attd/2) logr^^fc 
+00 by ([26]) for j = d. 

For the induction step in fl23|) from j ' + 1 to j for j = — 1, . . . , r if r < rf, it 
will suffice to show that 

Tj - Tj+i = -fj,k + log Tj+i^k - — log 



is bounded below. Since a > 0, ctj+i > by ( 1221) . and Tj+ijj > Tj^k, it will be 
enough to show that 



^j,k ■■= lj,k + |(ai+i - ttj) logTj-fc 



12 



R. M. DUDLEY, S. SIDENKO, AND Z. WANG 



is bounded below. Inserting the definitions of aj and aj+i from ( 122|) gives 



This is identically if Q{Dj) = 0. If Q{Dj) > 0, then A^-fc ^ +oo by ([26]) for j. 
The inductive proof of (!23|) and so of Claim 2 is complete. 
By ([HD, ([inD, and Claim 2, we then have 

(27) Qh{A) ^ +00 if ri(A) ^ or rd(A) ^ +oo or both, A G Pd. 

The infimum of Qh{A) equals the infimum over the set JC of A with ti{A) > 6i 
by (fTOll and Trf(A) < M for some M < oo by Claim 2. Then /C is compact. Since 
Qh is continuous, in fact C^, it attains an absolute minimum over /C at some B 
in /C, where its value is finite and it has a critical point. By Claims 1 and 2 again, 
Qh{B) < miA<^icQh{A). Thus Qh has a unique critical point B by Proposition 
[21 and Qh has its unique absolute minimum at B. So the theorem is proved. □ 



The main result of this section. Theorem [SI is an extension of results of Kent 
and Tyler (1991, Theorem 3.1), who found maximum likelihood estimates for fi- 
nite samples, and Diimbgen and Tyler (2005) for M-functionals, defined as unique 
critical points, for integer u, to the case of M-functionals in the sense of absolute 
minima and any z/ > 0. 

Kent and Tyler (1991, §3) and Kent, Tyler and Vardi (1994) showed that 
location-scatter problems in R'^ can be treated by way of pure scatter problems 
in R"'^^, specifically for functionals based on t log likelihoods. The two papers 
prove the following (clearly A is analytic as a function of S, fi and 7, and the 
inverse of an analytic function, if it exists and is C^, is analytic, e.g. Deimling 
[1985, Theorem 15.3 p. 151]): 

Proposition 4. (i) For any d = 1,2,..., there is a 1-1 correspondence between 
matrices A e Vd+i and triples (S,/i, 7) where E G Vd, G M"^, and 7 > 0, given 
by A = A(T,, /i, 7) where 




4. Location and scatter t functionals 



(28) A(S,/i,7)=7 ^ 




The correspondence is analytic in either direction, 
(a) For A = A(S, fi, 7), we have 
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For M-estimation of location and scatter in R"^, we will have a function p : 
[0, oo) 1-^ [0, oo) as in the previous section. The parameter space is now the set 
of pairs (/i, S) for /i G M*^ and S G Vd, and we have a multivariate p function (the 
two meanings of p should not cause confusion) 

(31) p(?/,(/^,S)) := llogdetS + p((y-/i)'S-i(y-/i)). 

For any /i G M'^ and S G Vd let Aq := Ao(/i, S) := A(S,/i, 1) G Vd+i by (EHj) 
with 7 = 1, noting that det Aq = det S. Now p can be adjusted, in light of (ITUl) 
and ([30]), by defining 

(32) h{y, (/i, S)) := p(y, (p, S)) - p(y, (0, /)). 

Laws P on R"' correspond to laws Q := P o T^^ on M^+^ concentrated in 
{y : yd+i = 1}, where Ti{y) := {y',!)' G M'^+^ ?/ G K''. We will need a 
hypothesis on P corresponding to Q G ^^+1,0(0). Kent and Tyler (1991) gave 
these conditions for empirical measures. 

Definition. For any oq := a{0) > let Vd,a{o) be the set of all laws P on R'^ such 
that for every affine hyperplane J of dimension q < d—1, P{J) < 1 — {d — q) / a^, 
so that P{J'^) > {d — q)/aQ. 

The next fact is rather straightforward to prove. 

Proposition 5. For any law P on M'^, a > d + 1, and Q := P o T^^ on R'^'^^ , 
we have P G Vd,a if and only if Q E Wd+i,a- 

For laws P G Vd,a{o) with a{0) > d + 1, one can prove that there exist p G M*^ 
and E G at which Ph{p, S) is minimized, as Kent and Tyler (1991) did for 
empirical measures, by applying part of the proof of Theorem [3] restricted to 
the closed set where 7 = Ad+i^d+i = 1 in (1501) . But the proof of uniqueness 
(Proposition [2]) doesn't apply in general under the constraint Ad+i^d+i = 1- For 
minimization under a constraint the notion of critical point changes, e.g. for a 
Lagrange multiplier A one would seek critical points of Qh{A) + X{Ad+i,d+i — ^), so 
Propositions [1] and [2] no longer apply. Uniqueness will hold under an additional 
condition. A family of p functions that will satisfy the condition, as pointed 
out by Kent and Tyler [1991, (1.5), (1.6)], comes from elliptically symmetric 
multivariate t densities with u degrees of freedom as follows: for < u < 00 and 
< s < cxD let 

u + d 

(33) p„{s) := p^,d{s) := log 

For this p, u is Ui^{s) := Uu^s) := {u + d)/{u + s), which is decreasing, and 
s ^— sUy^d{.s) is strictly increasing and bounded, so that ([7]) holds, with supremum 
and limit at +00 equal to aQ^y := aQ{uy{-)) = v + d> d ioi any i/ > 0. 
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The following fact was shown in part by Kent and Tyler (1991) and further 
by Kent, Tyler and Vardi (1994), for empirical measures, with a short proof, 
and with equation only implicit. The relation that v degrees of freedom in 
dimension d correspond to z/' = z/ — 1 in dimension + 1, due to Kent, Tyler 
and Vardi (1994), is implemented more thoroughly in the following theorem and 
the proof in Dudley (2006). The extension from empirical to general laws follows 
from Theorem [31 specifically for part (a) of the next theorem since ao = v + d > d. 

Theorem 6. For any c? = 1, 2, . . . , 

(a) For any z/ > and Q G Ud^,,+d, the map A i— > Qh{A) defined by for p = p^^ 
has a unique critical point A^u) := A^{Q) which is an absolute minimum; 

In parts (b) through (f) let v > I, let P be a law onR'^, Q = P o Tf ^ on M'^+^ 
and v' := u — 1. Assume P G Vd,u+d in parts (b) through (e). We have: 

(b) A{v')d+i4+i = J u,,,d+i{z'Aiu')-'z)dQ{z) = 1; 

(c) For any /i G R"' and H eVd let A = A(S, /i, 1) G Vd+i in ^EE)- Then for any 
y ER"^ and z := {y', 1)' , we have 

(34) u^i^d+iiz'A-^z) = u^^diiy - i^y^^^iy - fJ'))- 

In particular, this holds for A = v4(z/') and its corresponding p = p^ E R'^ and 

E = G Vd. 

(d) 

(35) j u^^diiy - fJ^uY^u^y - l^u))dP{y) = 1. 

(e) For h := := h^^d defined by [3^) with p = p^d, {l^u,^v) is an M- 
functional for P. 

(f) If, on the other hand, P ^ Vd,v+d, then {p,Ti) i-^ Ph{p,Il) for h as in part 
(e) has no critical points. 

Kent, Tyler and Vardi (1994, Theorem 3.1) showed that if u{s) > 0, u{0) < 
+00, u{-) is continuous and nonincreasing for s > 0, and su{s) is nondecreasing 
for s > 0, with oq := lims_++oo su{s) > d, and if equation (!35|) holds with u in 
place of Uu,d at each critical point {p, E) of Qnh for any Qn, then u must be of 
the form u{s) = Uu,d{s) = {u + d) / {u + s) for some u > 0. Thus, the method of 
relating pure scatter functionals in R'^'^^ to location-scatter functionals in R'^ given 
by Theorem [H] for t functionals defined by functions u^^^ does not extend directly 
to other functions u. For < u < 1, we would get u' < 0, so the methods of 
Section [3] don't apply. In fact, (unique) t^ location and scatter M-functionals may 
not exist, as Gabrielsen (1982) and Kent and Tyler (1991) noted. For example, if 
d = 1, < u < 1, and P is symmetric around and nonatomic but concentrated 
near ±1, then for — oo < p < oo, there is a unique (Jy{p) > where the minimum 
of Phi,{p,a) with respect to a is attained. Then a^iO) = 1 and (0,(7^(0)) is a 
saddle point of Ph^. Minima occur at some /x 7^ 0, a > 0, and at {p, a) if and 
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only if at (— /i, a). The Cauchy case u = 1 can be treated separately, see Kent, 
Tyler and Vardi (1994, §5) and references there. 

When (i = 1, P G Vi^^+i requires that P{{x}) < z//(l + z/) for each point x. 
Then E reduces to a number with cr > 0. \i u > 1 and P ^ Vi^y+i, then 
for some unique P({x}) > u/{y -\- 1). One can extend {ny^a^) by setting 
^u{P) '■= X and (Tu{P) '■= 0, with (/x,^, cr^) then being weakly continuous at all 
P, as will be shown in Section [91 

For d > 1 there is no weakly continuous extension to all P, because such an 
extension of would give a weakly continuous affinely equivariant location func- 
tional defined for all laws, which is known to be impossible [Obenchain (1971)]. 

5. Differentiability of t functionals 

One can metrize weak convergence by a norm. For a bounded function / from 
M"^ into a normed space, the sup norm is ||/||sup := sup^g^d ||/(x)||. Let V 
be a fc-dimensional real vector space with a norm ||'||, where 1 < < oo. Let 
BLIr"^, V) be the vector space of all functions / from into V such that the 
norm 

WflU := ||/||sup + sup||/(x)-/(y)||/|x-y| < oo, 

i.e. bounded Lipschitz functions. The space BL{R'^,V) doesn't depend on ||'||, 
although II'IIbl does. Take any basis ofV. Then f{x) = J2j=i fji^)'^j 

some fj e BL{R'^) := PL(M°', M) where E has its usual norm |-|. Let X := PL*(M'^) 
be the dual Banach space. For G X, let 

k 

07 := Y,<p{j))v,eV. 

i=i 

Then because is linear, 0*/ doesn't depend on the choice of basis. 

Let V{M.'^) be the set of all probability measures on the Borel sets of R'^. Then 
each Q e V{R'^) defines a 0q e BL*{M.'^) via 0q(/) := / fdQ. For any 
P,Q e P(M^) let P{P,Q) := IIP - Qlll,^ := ||0p - 0q||1,^. Then /5 is a 
metric on V{R'^) which metrizes the weak topology, e.g. Dudley (2002, Theorem 
11.3.3). 

Let U be an open set in a Euclidean space R'^. For A; = 1, 2, . . . , let C^{U) be 
the space of all real-valued functions / on f/ such that all partial derivatives -D^/, 
for DP := (9^ /dx\^ ■ ■ ■ dx^/ and < [p] := Pi + ■■■ + Pd < k, are continuous 
and bounded on U. Here D^f = f. On C^{U) we have the norm 

(36) \\f\\k,u ■= ^ \\D^f\\sup,u, where Hs-llsup.c/ := snp\g{x)\. 

0<[p]<k 

Then {C^{U), \\'\\k,u) is a Banach space. For k = 1 and U convex in R*^ it's easily 
seen that Cl{U) is a subspace of BL{U,R), with equal norm for d = 1. 
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Substituting p^^d from (l33l) into ([2]) gives for ?/ G M'' and A G Pd, 

(37) UAV^A) ■■= ^logdetA + ^log[l + z/-V^-'y]- 

Then, reserving h^, := hi^ d for the location-scatter case as in Theorem [6]|^e), we 
get in ([3]) for the pure scatter case 

(38) H,{y,A) := H,4y,A) := L.^^iy, A) - L^^y, I). 

It follows from ffTTl) and fl371) that for A G and C = A^'^, gradients with 
respect to C are given by 

(39) G^,){y,A) := VcH^Av.A) = VcL^Av.A) = + 2^^Yy'Cy) ^ 
For < 6 < 1 and d = 1,2, define an open subset of Vd C Sd by 

(40) Ws := Ws,d ■■= {AeVd-. max{\\A\\,\\A-^\\) <l/5}. 
For any A G Vd, C = v4~\ and Li, := L^.^^, let 

I{C,Q,H) := QHM) = j U{y,A)-L,{y,I)dQ{y), 



J{C,Q,H) := ilogdetC + /(C,g,i/) = ^ y log 



rfQ(l/). 



Proposition 7. (^aj The function C ^ I{C,Q,H) is an analytic function of C 
on the open subset Vd of Sd; 
(b) Its gradient is 

(41) VcHC, Q,H)^l ({u + d) [ dQ{y) - a] ; 



2 V J ^ + y'Cy 

(c) The functional C ^ J{C, Q, H) has the Taylor expansion around any C EVd 
(42) J{C + A,Q,H)-JiC,Q,H) = ^ / (^7T^^^^^^' 
convergent for ||A|| < 1/||A||; 

(d) For any 5 G (0, 1), z/ > 1 and j = 1, 2, ... , the function C t— > I{C, Q, H) is 
inCi{W5,d)- 

Proof. The term ^ log det C doesn't depend on y and is clearly an analytic 
function of C, having derivatives of each order with respect to C bounded for 
A G W^^rf. For ||A|| < l/||yl||, we can interchange the Taylor expansion of the 
logarithm with the integral and get part (c), (H2|) . Then part (a) follows, and 
part (b) also from (139|) . For part (d), as in the Appendix, Proposition [29] and 
the jth derivative f of a functional / defines a symmetric j-linear form 
d^ f , which in turn yields a j-homogeneous polynomial. Such polynomials appear 
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in Taylor series as in the one- variable case, ( l95i) . Thus from (1421) . the jth Taylor 
polynomial of C ^ J{C,Q,H), times j!, is given by 

(43) diJ{C,Q,H) = "^{-ly-'U - l)\ [ , ^^'^Jr ,, dQ{y\ 

2 J [u + y'Cyy 

which clearly is bounded for ||A|| < 1 when the eigenvalues of C are bounded 
away from 0, in other words ||A|| is bounded above. Then the jth derivatives are 
also bounded by facts to be mentioned just after Proposition [23 □ 

To treat t functionals of location and scatter in any dimension p we will need 
functionals of pure scatter in dimension p + 1, so in the following lemma we only 
need dimension d > 2. 

Usually, one might show that the Hessian is positive definite at a critical point 
in order to show it is a strict relative minimum. In our case we already know from 
Theorem EJ^a) that we have a unique critical point which is a strict absolute min- 
imum. The following lemma will be useful instead in showing differentiability of 
t functionals via implicit function theorems, in that it implies that the derivative 
of the gradient (the Hessian) is non-singular. 

Lemma 8. For each u > 0, d = 2,3, and Q G Ud^u+d, o,t A^u) = A^{Q) G Vd 
given by Theorem\B(a), for Hy = Hy^d defined by ( TfSj) . the Hessian of QHy on 
Sd with respect to C = is positive definite. 

Proof. Each side of fj42|) equals 

+ 0(||Af). 

The second-order term in the Taylor expansion of C i-^ I{C,Q,H), e.g. (1951) in 
the Appendix, using also f|TT]) with C in place of A, is the quadratic form, for 

(44) A « 1 {wA^'^AAni + J^i^iQ(y)) ■ 

(Since differences of matrices in Vd are in 5^, it suffices to consider A G Sd-) The 
Hessian bilinear form (2-linear mapping) l-i2,A from Sd x Sd into M defined by the 
second derivative at C = A^^ of C h-* I{C,Q,H), cf. is positive definite if 
and only if the quadratic form (1441) is positive definite. The Hessian also defines 
a linear map IH-a from Sd into itself via the Frobenius inner product, 

(45) {nA{B),D) = tTace{nA{B)D) = n2AB,D) 

for all B,D & Sd- Since A i-* A^^ is with C°° inverse from Vd onto itself, 
it suffices to consider QH as a function of C = A~'^, in other words, to consider 
I{C,Q,H). Then we need to show that (jHj) is positive definite in A G iS^ at 
the unique A = A^i^Q) G Vd such that V^/(C, Q,H) = in fHTl) . or equivalently 



u + d 



y'^y 

V y'Cy 



dQ{y) 



{y'^yY 



2{p + y'CyY 



dQ{y) 
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V(7/(C, Q, H) = 0. By the substitution z := A ^/^y, and consequently replacing 
Q by g with dq{z) = dQ{y) and A by A^/'^AA^/'^, we get / = A^{q). It suffices 
to prove the lemma for (/, q) in place of {A, Q). We need to show that 

(46) i|A||^ > {u + d)J ^^M^) 

for each A 7^ in S^- By the Cauchy inequality i^z' Az^ < {z'z){z'A'^z), we have 
, ^, f {z'Azf , , . ^ , f {z'z){z'Ah)^ . , 

J [u + z'zY J [u + z'zy 

r (z'A'^z) ( f zz' 

<{u + d) j )y^r^dq{z) = trace (A^(z/ + rf) / ^^^,^ dq{z) 

= trace(A^) = ||A||^, 

using and (jH]) with B = A = C = I. Now, z'z < u + z'z for all 2; 7^ 0, and 
z'A'^z = only for z with Az = 0, a linear subspace of dimension at most d — 1. 
Thus ^(A^; = 0) < 1, fH6l) follows and the Lemma is proved. □ 

Example. For Q such that A^{Q) = 1^, the d x d identity matrix, a large part 
of the mass of Q can escape to infinity, Q can approach the boundary of Ud,u+d, 
and some eigenvalues of the Hessian can approach 0, as follows. Let ej be the 
standard basis vectors of R'^. For c > andp such that l/[2{i' + d)] < p < l/{2d), 
let 

d 

Q := (1 - 2dp)5o + P ^ 5-cej + 5cej- 

To get A,y{Q) = Id, by ([8]) and dH]) we need {u + d) ■ 2pc^ = z/ + or i/ = 
c^[2p(z/ + d) — 1]. There is a unique solution for c > but as p | l/[2(z/ + (i)], we 
have c| + 00. Then, for each g = 0, 1, d — 1, for each g-dimensional subspace 
H where d — q oi the coordinates are 0, Q{H) 1 1 — the critical value for 
which Q ^ Ud,u+d- Also, an amount of probability for Q converging to d/iv + d) 
is escaping to infinity. The Hessian, cf. (B6|) . has d arbitrarily small eigenvalues 
z//(z/ + c2). 

For the relatively open set Vd C Sd and G(^y) from fl5^ . define the function 
F := from X x Vd into Sd by 

(47) F{<P,A) := 0*(G(,)(-,A)). 

Then F is well-defined because G(^u){-,A) is a bounded and Lipschitz 5^- valued 
function of x for each A G Vd', in fact, each entry is with bounded derivative, 
as is straightforward to check. 
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For d = 1, and a finite signed Borel measure r, let 



(48) 



t\\ic := sup |r((-oo,x])|. 



X 



Let P and Q be two laws with distribution functions Fp and Fq. Then ||P — QH^: 
is the usual sup (Kolmogorov) norm distance sup^, \ {Fq — Fp){x)\. 

The next statement and its proof call on some basic notions and facts from 
infinite-dimensional calculus, which are reviewed in the Appendix. 

Theorem 9. Let u > in parts (a) through (c), u > 1 in parts (d), (e). 
(a) The function F = Fy is analytic from X x Vd into Sd where X = BL*{r'^). 
(h) For any law Q G Ud,u+d, and the corresponding (pq G X , at v4y(Q) given by 
Theorem\^a), the partial derivative linear map dcF{(j)Q, A)/dC := VcF{(f)Q, A) 
from Sd into Sd is invertible. 

(c) Still for Q G Ud,u+d, the functional Q hh> A^{Q) is analytic for the BL* norm. 

(d) For each P G Vd,v+d, the t^ location- scatter functional P i— > (/i^, S,^)(P) given 
by TheoremslM and{B is also analytic for the norm on X . 

(e) For d = 1, the ty location and scatter functional /i^, al are analytic on Vi^^+i 
with respect to the sup norm ||'||/c- 

Proof, (a): The function {(f), f) t— > 0(/) is a bounded bilinear operator, hence 
analytic, from BL* (R"^) x BL{R'^) into M, and the composition of analytic functions 
is analytic, so it will suffice to show that A \—> G(^y){-,A) from fl39|) is analytic 
from the relatively open set Vd C Sd into BL{R'^,Sd). By easy reductions, it 
will suffice to show that C t-^ (y t-^ yy'/{^ + y'Cy)) is analytic from Vd into 
BL(R'^, Sd). Fixing C = A^'^ and considering C + A for sufficiently small A G Sd, 
we get 



which we would like to show gives the desired Taylor expansion around C. For 
j = 1,2, ... let Qjiy) := {-y'Ayy{u + y'Cy)~^~'^ G R and let fj be the jth term 
of (H9l) . fjiy) := gj{y)yy' G Sd- It's easily seen that for each j, fj is a bounded 
Lipschitz function into Sd- We have for all y, since u + y'Cy >v + \y\'^/\\A\\, that 



(49) 




(50) 



\g,{y)\ < ||A|n|A||V(z.+ |y|VP||). 



For the Frobenius norm H H^;' on Sd, it follows that for all y 



(51) 



\\f,{y)\\p<\\AnA 
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Thus for ||A|| < l/||y4||, the series converges absolutely in the supremum norm. 
To consider Lipschitz seminorms, for any y and z in R'^ we have 

= tTace[gj{yf\y\^yy' + gj{zf\z\'^zz' - gj{y)gj{z){{y'z)yz' + {z'y)zy'}] 
= 9jiy?\y\' + 9A^)M' - 2g,{y)g,{z){y'zf 
and so, letting G{y, z) := gj{y)gj{z){y' zY G M for any y, z G K'^, we have 

(52) \\f){y)-fAz)\\l = G{y,y)-2G{y,z) + G{z,z). 

To evaluate some gradients, we have Vyiy'By) = 2By for any B G Sd, and 
thus 

"^ydM = 'f^'/^rw + y'cy)^y - + m-y'^y)cy]. 

~r y '^y) 

It follows that for all y 

\Vyg,{y)\<2{j + l)\\A\\^\A\r'/\u + 2\\CM^^^ 
and so since ||A||||C|| > 1, 

(53) \Vyg,{y)\ < (4j + 4)||A|n|A|rV2||C||(z. + |y|VP||)-3/^ 

Letting Ai be the gradient with respect to the first of the two arguments we have 

AiG{y,z) = {y'zfgj{z)Aygj{y)+2gj{y)gj{z){y'z)z. 
For any u G M'^, having in mind u = Ut = y + t{z — y) with < t < 1, we have 
^^^^ ^iG{u, z) - A^G{u, y) = [{v;zfg,{z) - {u'y fg^{y)]Vugj{u) 

+ '^9j{u)[gj{z){u'z)z - gj{y){u'y)y]. 
For the first factor in the first term on the right we will use 

V,[{u'vfg,{v)] = 2gj{v)iu'v)u+ {u'vYV,gj{v). 
From ( l50D and ( l53l) it follows that for all u and v in R"^ 

{^j+'^)vm\\\c\\\v\ 



\V.[iu'vyg,iv)]\ < \\Ar\\A\\^u\'\v\ , + 



u+\v\y\\A\\ {u + \v\y\\A\\y/^ 

Now, for all v, 2\v\/{u + \v\y\\A\\) < \\A\\'/^ and \v\y{iy+\v\y\\A\\f/^ < 
It follows, integrating along the line {u, v) from v = y to v = z for each fixed u, 
that 

\{u'zrg,{z) - {u'yrgMl < k - 1/| || A||i/l|p-+3/Vr(4j + 5)||C||. 

By this and ( l53l) . since \u\'^/{i' + l-up/H A||)^/^ < \\A\\, the first term on the right 
in (15^ is bounded above by 

(55) {4j + 5nAr^\Ar^^^Cr\y-z\. 
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For the second term on the right in (JMl), the second factor is gj{z){u'z)z — 
gj{y){u'y)y. The gradient of a vector- valued function is a matrix- valued function, 
in this case non-symmetric. We have 

V,[gj{v){u'v)v] = (y,g,{v)){u'v)v' + gj{v)[uv' + {u'v)I]. 

It follows by (|SU]) and that for any v 

\\V4g,iv)iu'v)v]\\ < ||A|n|A||^'+i/V|{2 + (4j + 4)P||||C||}. 

Multiplying by 2gj{u), and integrating with respect to v along the line segment 
from V = y to V = z, we get for the second term on the right in (l54l) 

\2g,{u)[g,{z)iu'z)z - g,iy)iu'y)y]\ < \\Ar^\Af^^'\\C\\{6j + 6)\z - y\. 
Combining with flSSl) gives in (jSlD 

\A,G{u, z)^AiGiu,y)\ 
< \\AfmAf^+^C\\{{Aj + bfWAW \\C\\ + (6j + 6)}\z - y\ 
< ||Af J'pf ^■+3||Cf (6j + Qy\z - y\. 
Then integrating this bound with respect to u on the line from u = y to u = z 
we get 

\G{z,z)-2G{y,z) + G{y,y)\<\\Ar^\Ar^+^Cri6j + 6f\y-z\' 

and so by ([52]) < ||A|p'p|p'+3/2||(:7||(6j + 6). Since the right side of the 

latter inequality equals a factor linear in j, times || A|p || times factors fixed 
for given A, not depending on j or A, we see that the series (H9|) converges not 
only in the supremum norm but also in || • \\l for ||A|| < l/||y4||, finishing the 
proof of analyticity of A t— > yy'/ (z/ -|- y'Gy) into BLiR"^, Sd) and so part (a). 

For (b), A^ exists by Theorem [3] with u = u^^d, so a(0) = p + d > d. The 
gradient of F with respect to A is the Hessian of QH^, which is positive definite 
at the critical point Ay by Lemma [S] and so non-singular. 

For (c), by parts (a) and (b), all the hypotheses of the Hildebrandt- Graves im- 
plicit function theorem in the analytic case, e.g. Theorem ISOT c) in the Appendix, 
hold at each point ((^q, Ay{Q)), giving the conclusions that: on some open neigh- 
borhood U of (pQ in X, there is a function (j) i— >• ^4,^(0) such that = 
for all G f/; the function Ay is and, since F is analytic by part (a), so 
is Ay on U . Existence of the implicit function in a BL* neighborhood of (f)Q, 
and Theorem [31 imply that lAd^y+d is a relatively || ■ W*^^^ open set of probability 
measures, thus weakly open since (5 metrizes weak convergence. We know by 
Theorem [31 (1551) and the form of Uy^d that there is a unique solution Ay{Q) for 
each Q G Ud,u+d- So the local functions on neighborhoods fit together to define 
one analytic function A y on lAd^v+di and part (c) is proved. 

For part (d), we apply the previous parts with d + 1 and z/ — 1 in place of d 
and V respectively. Theorem [61 shows that in the ty case with u > 1, fi = Hy and 
S = give uniquely defined M-functionals of location and scatter. Proposition 
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m shows that the relation ( l28i) with 7 = 1 gives an analytic homeomorphism with 
analytic inverse between A with A^+i = 1 and {fi, S), so (d) follows from (c) 
and the composition of analytic functions. 

For part (e), consider the Taylor expansion (HOj) related to G'(iy), specialized 
to the case d = 1, recalling that we treat location-scatter in this case by way 
of pure scatter for d = 2, where for a law P on R we take the law P o T^^ on 
concentrated in vectors (x, 1)'. The bilinear form (/, t) ^-^ j f dr is jointly 
continuous with respect to the total variation norm on /, 

k 

||/||[i] := ||/||sup+ sup J2\fi^,) - fi^^~i)l 

— oo<xi<---<a;j,<+oo, fe=2,3,... ■_„ 



and the sup (Kolmogorov) norm \\'\\ic on finite signed measures fl48p . Thus it will 
suffice to show as for part (a) that the iS2-valued Taylor series (149!) has entries 
converging in total variation norm for ||A|| < 1/||A||. 

An entry of the jth term fj{{x, 1)') of (H9l) is a rational function R{x) = 
U{x) /V{x) where V has degree 2j + 2 and U has degree 2j + i for 2 = 0,1, or 2. 
We already know from §^ that ||i?||sup < || A|p||A|p+^ A zero of the derivative 
rational function R'{x) is a zero of its numerator, which after reduction is a 
polynomial of degree at most 2j + 3. Thus there are at most 2j + 3 (real) zeroes. 
Between two adjacent zeroes of R' the total variation of R is at most 2||i?||sup. 
Between ±oo and the largest or smallest zero of R', the same holds. Thus the total 
variation norm < (4j + 9)||i?||sup. Since Ej°li(4j + 9) || A|p' < oo for 

||A|| < l/||y4||, the conclusion follows. □ 

If a functional T is differentiable at P for a suitable norm, with a non-zero 
derivative, then one can look for asymptotic normality of ^/n{T{Pn) — T{P)) 
by way of some central limit theorem and the delta-method. For this purpose 
the dual-bounded-Lipschitz norm || ■ ||^^, although it works for large classes of 
distributions, is still too strong for some heavy-tailed distributions. For d = 1, 
let P be a law concentrated in the positive integers with Xlfcli V^^ii^Y) = +oo. 
Then a short calculation shows that as n — cx), y/nYlkLi li^n — P){{k}) \ +oo 
in probability. For any numbers ak there is an / G BL{r) with usual metric such 
that f{k)ak = \ak\ for all k and < 3. Thus y/n\\Pn — P\\*bl ~^ +^ 

probability. Gine and Zinn (1986) proved equivalence of the related condition 
— 1 < l-^l < JY^"^ < oo for X with general distribution P on M to 
the Donsker property [defined in Dudley (1999, §3.1)] of {/ : ||/||_bl < 1}- 
But norms more directly adapted to the functions needed will be defined in the 
following section. 
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6. Some Banach spaces generated by rational functions 

For the facts in this section, proofs are omitted if they are short and easy, 
or given briefly if they are longer. More details are given in Dudley, Sidenko, 
Wang and Yang (2007). Throughout this section let < 5 < 1, = 1, 2, ... and 
r = 1,2, ... be arbitrary unless further specified. Let AiAir be the set of monic 
monomials g from R'^ into M of degree r, in other words g{x) = Uf^^x^^ for some 
rii E N with J2i=i '^j = ^- Let 

^5,r := := {/: K'^ ^ R, f{x)=g{x)/U:^,{l + x'Csx), 

where g G MM2r, and for s = 1, ...,r, Cg G W^j. 

For 1 < j < r, let JF]-'^-' := J-'^''^^ be the set of / G J-'s^r such that Cg has at 
most j different values (depending on /). Then J^s,r = ^s'r- Let Q^^^ := Gg^ld '■~ 
ULi -^S- We will be interested in j = 1 and 2. Clearly J^^^^. C J^jJ.^ C ■ ■ ■ C J^5,^ 
for each S and r. 

Let hc{x) := 1 + x'Ca; for C eVd and x G M'^. Then clearly / G jj^,'' if and 
only if for some P G MM2T and C G W^, /(a;) = fp,c,r{x) := P{x)hc{x)~'' . 
The next two lemmas are straightforward: 

Lemma 10. For any f G Q'fJ we have {S/dy < ||/||sup < ^ 

Lemma 11. Let f = fp^c,r o^nd g = fp^D,r for some P G M.A42r ci^id C,D & Vd- 
Then 

, x'{D-C)xP{x)ZT=lhD{xr-'-'hc{xy 
{hchD){xy 

For 1 < k < I < d and j = 0, 1, r — 1, let 

hc,D,k,i,r,j{x) := XkXiP{x)hc{xy~''hD{x)~^~^ . 
(2) 

Then each hc,D,k,i,r,j is in J-'g^+i '^'^^ 

r-l 

(57) g - f = - ^ ^{Dki - Cm) {2 - Ski)hc,D,k,i,r,j- 

l<k<l<d j=0 

For any / : M*^ — > M, define 

{00 00 
s=l s=l 

or +00 if no such A^, gs with |As| < 00 exist. Lemma fTOl implies that for 
I As I < 00 and gg G Q'^'J, ^g^sQ-s converges absolutely and uniformly on W^. 
Let := Y^^^^ := {/ : M'^ ^ M, ||/||*';: < 00}. It's easy to see that each Yl^ 
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is a real vector space of functions on M*^ and || • ||^'^ is a seminorm on it. The next 
two lemmas and a proposition are rather straightforward to prove. 

Lemma 12. For any j = 1,2,..., 

(a) Iffe g'i} then f G F^, and ||/||*;^ < 1. 

(b) For any g e Y^^, \\g\\sup < \\g\\*s,r/^'' < 

(c) If f e g^J then 11/11^;^- > (W. 

(^) II ■ ll^r ^ norm on Yg^. 

(e) Yg^ is complete for || • ||^';^ and thus a Banach space. 

Lemma 13. For any j = 1,2,..., we have Yg^ C Yg^_^^. The inclusion linear 
map from Yg^ into Yg^^^ has norm at most 1. 

Proposition 14. For any P e MM2r, let ijj{C,x) := fp,c,r{^) = P{x)/hc{xy 
from Ws X R'^ into K. Then: 

(a) For each fixed C e Wg, ipiC, •) e J^it^. 

(b) For each X, ip{-,x) has partial derivative Vci^{C,x) — —rP{x)xx'/hc{xY'^^. 

(c) The map C ^ Vc'ipiC, ■) G Sd on has entries Lipschitz into 

(d) The map C i— > iIj{C,-) from Ws into J^g^^ C Yg,^, viewed as a map into the 
larger space Yg^^_^2) ^■^ Frechet C^. 

Theorem 15. Let r = 1, 2, d ^ 1,2, < 5 < 1, and f E Yg^.^, so that for 
some ttg with \as\ < oo we have f{x) = '^gdsPsi^)/ +x'Csx)^'' for x G M*^ 
where each Pg G M.M.2ks} = 1) and Cg G W^. Then f can be written as 
a sum of the same form in which the triples {Pg, Cg,kg) are all distinct. In that 
case, the Cg, Pg, kg and the coefficients ag are uniquely determined by f. 

Proof. If d = 1, then Pg{x) = x^'^' and Cg G {6, 1/5) for all s. We can assume 
the pairs {Cg, kg) are all distinct. We need to show that if f{x) — for all real 
X then all = 0. Suppose not. Any / of the given form extends to a function 
of a complex variable z holomorphic except for possible singularities on the two 
line segments where ^z = 0, \/6 < \^z\ < 1/VS, and if / = on M then / = 
also outside the two segments. For a given Cg take the largest kg with ag ^ 0. 
Then by dominated convergence for sums, \ag\ = limijo ^^'^ |/(^ + V'n/^)! = 0, a 
contradiction (cf. Ross and Shapiro, 2002, Proposition 3.2.2). 

Now for d > 1, consider lines x = yu E R'^ for y G M and any u E M.'^ with 
\u\ — 1. We can assume the triples {Pg, Cg,kg) are all distinct by summing terms 
where they are the same (there are just finitely many possibilities for Pg). There 
exist u (in fact almost all u with \u\ = 1, in a surface measure or category sense) 
such that Ps{u) ^ Pt{u) whenever Pg ^ Pt, and u'CgU ^ u'CtU whenever Cg ^ Ct, 
since this is a countable set of conditions, holding except on a sparse set of m's in 
the unit sphere. Fixing such a we then reduce to the case d—1. □ 



DIFFERENTIABLE T LOCATION-SCATTER FUNCTIONALS 25 

For any P G MM2r and any C ^ D in Ws, let 

fp,c,DAx) :- fp,c,D,rM) - (1 + ^^CxY ~ {l+x'Dxy 

By Lemma [TTl for C fixed and D C we have ||/p,c,D,r|l5r+i ~* 0- Tfie following 
shows this is not true if r + 1 in the norm is replaced by r, even if the number of 
different C^'s in the denominator is allowed to be as large as possible, namely r: 

Proposition 16. For any r = 1,2, d = 1,2, . . . , and C ^ D in W^, we have 
\\fp,c,DA\Zr = 2. 

The proof is similar to that of the preceding theorem. 

Let hc,u{x) := u + x'Cx, r = 1,2, . . . , P E MM2r, and 

:= i'{u),r,p{C,x) := P{x)/hc,u{xY . 
Then -^(i/) (C*, a;) = v~'^il){C /v,x) and we get an alternate form of Proposition [T4l 

Proposition 17. For any d = 1,2, r = 1,2, and < 5 < 1, 

(a) For each C E Ws, ■) E .F^Ji,,. 

(b) For each x, has the partial derivative 

^ci'(u)iC,x) = —rP{x)xx' / {vhc/v{x)y^^ = —rP{x)xx'/hc,u{xY~^^- 

(c) The map C ^ Vc'ip[u){C, ■) E Sd on Ws has entries Lipschitz into Y^i^^^^. 

(d) The map C t—>- il)(^y){C, ■) from Ws into O^fj^,^, viewed as a map into ^5/(^^+2' 
is Frechet . 

Let M©y/^ be the set of all functions c+g on M'^ for any c EM. and g E y/,,.. Then 
c and g are uniquely determined since (/(O) = 0. Let ||c + := |c| + llf/ll^';^,^. 

7. Further differentiability and the delta-method 

By (Hn}, and dni, dnSD, and dnnD in the Appendix, for any < 5 < 1, C e W^, 
A E Sd, and A; = 0, 1, 2, ... , the kth differential of from fl39l) with respect to 
C is given by 

(59) d'cG^,){y,A)A^' = K,{A)A^' + g,{y. A, A) 
with values in Sd, where 

for some /c-homogeneous polynomial Kk{A){-) not depending on y. For A E Sd, 
by the Cauchy inequality, Yl'ij=i\^ij\ — W^Wpd, so each entry g^^-. A, A)ij E 
i,j = l,...,d, with 

(60) \\g,{;A,AY,\\;f^^,^^^, < {u + d)k\{\\Ayd/u)\ 
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Thus {d^G(^i,){-, A)A'^'^)ij G ^®ys)iyk+i d- Let Xs,r,u be the dual Banach space of 
R © ys/urd^ i-^- of real- valued linear functionals on it for which the 

norm 

U\kr,u := sup{|0(/)|: 11/11^;;',,, <l}<oo. 
Let Xl^,^ := {0 G Xs,r,u : 0(c) = for all c G M}. For G X^^,,, by m 

Ulkr,. ^ ml,. ■■= sup{|0(O,^?)|: < 1} 

^^^^ < sup {10(0,^7)1 : g G ^JJi J < sup {|0(O,^?)| : g G ^(;)^,} . 

For A G W5,d as defined in (jlQl) and G X^,^,^, define -F(0, A) again by (H71) . 
which makes sense since for any r = 1, 2, . . ., G(^.) has entries in y//^ ^ C ^^/j,^^- 
Proposition [161 closely related to Theorem [151 implies that in the following the- 
orem k + 2 cannot be replaced by A; + 1. 

Theorem 18. For any d = 1,2, . . ., k = 1,2, . . ., < u < oo, and Q G Ud,u+d, 
there is a 6 with < 6 < 1 such that the conclusions of Theorem [5] hold for 
X = Xs,k+2,u in place of BL*(R'^), Ws^d in place of Vd, u > 1 in part (d), and 
analyticity replaced by in parts (a), (c), and (d). 

Proof. To adapt the proof of (a), A.{Q) given by Theorem [6]^a) exists and is in 
Ws for some 6 G (0, 1). Fix such a 6. For each A G Ws and entry / = G'(i/)(-, A)ij, 
we have f = c + g& R(B Y^^^ ^ ^, so 0(/) is defined for each G X. The map 
C ^ G^.){-,A)ij is Frechet from Ws into R © Ys%,3,d Proposition [TTl^d), 
and since the term —A in ( l39l) not depending on y is analytic, thus with 
respect to C = A~^. Now for k > 2 and r = — 1 we consider d^G A)A'^^ 
in dSni) in place of G(^.){-,A) and spaces ^sy^^am-i+r.d place of 177^,2m-i,d 
m = 1,2. Each additional differentiation with respect to G adds 1 to the power 
of + y'Gy in the denominator. Then the proof of (a), now proving G^ under 
the corresponding hypothesis, can proceed as before. 
For (b), the Hessian is the same as before. 

For (c), given Q G lAd,u+d and 5 > such that Ay{Q) G W^,^, parts (a) and 
(b) give the hypotheses of the Hildebrandt- Graves implicit function theorem, 
G^ case. Theorem [SOT b) in the Appendix. Also as before, there is a || ■ ||5,a:+2,i^ 
neighborhood V of 0q on which the implicit function, say A.y, exists. By taking 
V small enough, we can get Ayyi^cf)) G for all G V^. For any Q' G Ud^+d 
such that 0Q/ G V, we have uniqueness A.y{(f)Q') = A.{Q') by Theorem [21 Thus 
the C*^ property of A.y on V with respect to || ■ || 5,^+2,1^, given by the implicit 
function theorem, applies to Au{-) on Q such that 0q G V, proving (c). 

Part (d), again using earlier parts with {d + l,h'—l) in place of {d, u), and now 
with G^, then follows as before. □ 

Here are some definitions and a proposition to prepare for the next theorem. 
Recall that 0{d) is the group of all orthogonal transformations of R'^ onto itself 
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{dx d orthogonal matrices). Then 0{d) is compact. Let Xd be the Haar measure 
on the Borel sets of invariant under the action oiO{d) on itself, normalized 
so that XdiO{d)) — 1. 

The Grassmannian G{q, d) is the space of all g-dimensional vector subspaces of 
R'^. Each g G 0{d) defines a transformation of d) onto itself. Fix V G G{q, d). 
For each Borel set B C G{q, d), define a measure '^d,q{B) := Xd{{g G 0{d) : gV G 
S}). Then 7^ ,j is a probability measure on G{q, d), invariant under the action of 
0{d). The following may well be known, but we do not know a reference for it. 

Proposition 19. Let Q be any law onW^ ford > 2. Then for each q — 1, ...,d—l, 
^a,AH e d) : Q{H) = Q({0})} = 1. 

Proof. Let J{q) := jQ^q) := {H G G{q,d) : Q{H) > Q({0})}. For g = 1, the 
sets H \ {0} for if G (^(1, c?) are disjoint, so J(l) is countable and 7d,i(J(l)) = 0. 

Wc claim that \i I < q < r < d and G G{q,d), then 7,i,r{ii G G{r,d) : 
if D i^} = 0. It suffices to prove this for q = I. Let v be one of the two unit 
vectors ±v in K. Then for g G K C g'if if and only if g'^v G if. Now 

g~^v is uniformly distributed on the unit sphere and so is in H with probability 
as claimed. 

For r = 1, (i — 1 let X(r) be the set of all subspaces H G J(r) such that there 
is no if G J{q) with 1 < g < r and K <Z H. For any ifi 7^ if2 in ^l?") we have 
HiH H2 & G{m, d) for some m < r and Q({Hi D H2) \ {0}) = by assumption. 
Thus the sets ii\ {0} for H G X(r) are essentially disjoint for Q, with probability 
> 0, so X(r) is countable for each r. It follows that for each r = 1, 0? — 1, 

r 

7d,r(^(r)) = J2^dAH ^G{q,d) : HdK for some K e I{r)} = 

q=l 

by the claim and since each X(r) is countable. The Proposition is proved. □ 

Here is a delta-method fact. 

Theorem 20. (a) For any d = 2,3,..., z/ > 0, and Q G Ud,u+d with empir- 
ical measures Qn, we have G lAd,v+d with probability — > 1 as n ^ 00 and 
\/n{A,^{Qn) — Ay[Q)) converges in distribution to a normal distribution N{0, S) 
on Sd- The covariance matrix S has full rank d{d-\- 1)/2 if Q is not concentrated 
in any set where a non-zero second- degree polynomial vanishes, e.g. if Q has a 
density. For general Q G Ud,u+d, if d ~ 1 the rank is exactly 1, and for d>2, the 
smallest possible rank of S is d — 1. 

(b) For any ci = 1, 2, 1 < z/ < cxd and P G Vd,y+d with empirical measures Pn, 
we have Pn G Vd,v+d with probability — > 1 as n — > 00 and the functionals /i^ and 
Ej, are such that as n — > oo, 

v/^[(//„E,)(P„)-(/i„E,)(P)] 
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converges in distribution to some normal distribution with mean on R"^ x R'^ , 
whose marginal on M°'^ is concentrated on Sd- The covariance of the asymptotic 
normal distribution for ^^{Pn) has full rank d. The rank of the covariance for 
T,y{Pn) has the same behavior as the rank of S in part (a). 

Proof. Let = 1 or larger. Choose < 5 < 1 such that Ay = Ay{Q) G W^. For 
(a), let r^^^'*^ := ^^/^fc+2d- To control differences P„ — P on classes T^'^^''^ we 
have the following. 

By Lemma [TOl for any k = 1,2,..., L^^^'"' is a uniformly bounded class of 
functions. It is a class of rational functions of the yj and Cki in which the 
polynomials in the numerators and denominators have degrees < m := 2k + 4. 
If A{y) and B{y) are any polynomials in y of degrees at most m, with B{y) > 
for all y (as is the case here), then for any real c, the set {y : A{y)/B{y) > c} = 
{y : {A — cB){y) > 0}, where A — cB is also a polynomial of degree at most m. 

Let S{r, d) be the collection of all sets {x G K'^ : p{x) > 0} for all polynomials p 
(in d variables) of degree at most r. Then for each r and d, £{r, d) is a VC (Vapnik- 
Chervonenkis) class of sets, e.g. Dudley (1999, Theorem 4.2.1). So L^"^^'"^ is a VC 
major class of functions for £{2k + 4, c/), and a VC hull class (defined in Dudley 
[1999, pp. 159-160]). It is uniformly bounded and has sufficient measurability 
properties by continuity in the parameter A &Vd [Dudley (1999, Theorem 5.3.8)]. 
It follows that r^^^''^ is a universal Donsker class [Dudley (1999, Corollary 6.3.16, 
Theorem 10.1.6)], in other words, for any 5 > and r = 1,2,... and any law 
Q, -/n f fd{Qn — Q) is asymptotically normal (converges to a Gaussian process 
Gq indexed by /) uniformly for / G L^^^''^. In particular we have the bounded 
Donsker property, i.e. \/n\\Q n — Q\\s^k^2,u is bounded in probability, where we now 
identify 0q with Q and likewise for Qn- We also have that r^"|^^''^ is a uniform 
Glivenko-Cantelli class by Dudley, Cine and Zinn (1991, Theorem 6), so that 
IIQn — Q\\&,k+2,u almost surely as n ^ oo. Thus almost surely for n large 
enough, Qn G V for the neighborhood V of Q defined in the proof of Theorem 
HH so e UdM+d and Ay{Qn) is defined. 

By Theorem [T8l(c) for = 1 and (16T|) . we have 

(62) Ay{Q,,) - Ay{Q) = {DAy){Qn-Q) + o{\\Q,,-Q\\s,3,u) 

as n —>■ oo. The remainder term is Op{l/^/n) by the bounded Donsker property 
mentioned above. 

To make DAy more explicit, one can use partial derivatives of F as follows. For 
anyCeXandA^ := Ay{Q), we have F{(f)Q + (, A^) - F{(pQ, A^)^ = F{C,Ay), 
so the partial derivative of F with respect to at (0q, A^) is the linear operator 
D^F : ( f— > ({G(^y){-, Ay)) from X into Sd, which is continuous since each entry 
of G(y)(-,y4j,) is in T^'^f''^. The partial derivative of F{(j),A) with respect to C, 
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a.t A = Au, is given as mentioned previously by the Hessian (Hll), shown to be 
positive definite in Lemma El 

Recall the Hessian linear map H := Ha from to itself defined by (H5i) . By a 
classical formula for derivatives of inverse functions, e.g. Deimling (1985, p. 150), 
DA.iC) = -n-'D^F{^Q,A,){C), from which 

(63) DA,{Q^-Q) = -n-^i^j G(,){y,A,)d{Qn-Q){y)Y 

Multiplying by -^n, the resulting expression is asymptotically normal by a finite- 
dimensional central limit theorem. 

The rank of the covariance is preserved by the nonsingular 7i~^. The rank is 
the largest size of a subset S of the set {(z,j) : 1 < i < j < d} for which the 
functions fij with fijiy) := yiUj/ {v + y'Cy) for (z, j) G S are linearly independent 
with respect to Q modulo constant functions, i.e. there do not exist constants ajj, 
{i-ij) e not all 0, and a constant c such that "^(^i j)^s '^ijfij ~ ^ almost surely 
for Q. By a linear change of variables we can assume that A = I = C . 

For (i = 1, /ii cannot be a constant a.s. since Q G U\^yj^\ is not concentrated 
in two points, so the rank (of the covariance) is exactly 1. 

For any d, a linear dependence relation Ofijiij = c with aij not all is 

equivalent to a quadratic polynomial equation j) '^ijUiUj = c(z/ + y'y) holding 
a.s. Q. If no such equation holds, e.g. Q has a density, then the rank has its 
maximum possible value d{d + l)/2. 

For any d > 2, let ej, j = 1, d, be the standard unit vectors in R'^. Let 

1 

Then for each {y + d) j yiyjdQiii) / {y + |?/p) = so A = I = C as desired. 
Clearly = Q-a.s. for i j. One can check that Q e Ud,u+d for any d > 2 
and z/ > 0. 

We have X]f=i /m — Ivl'^/i^ + bP) = d/i^u + d) almost surely with respect to 
Q, so the rank is at most d—1. Conversely consider g{y) := YlfZi o-ifuiy) where 
some tti 7^ 0. Then g{y) = for y = ±-\/derf and g{y) = aid/{v + d) 7^ for 
y = ±\/dei, each occurring with Q-probability > 0, so g is not constant a.s. Q, 
the d—1 functions are not linearly dependent mod constants, and the rank is 
exactly — 1 in this case. 

Now for d > 2 and any q G Ud,u+d, still with A = C = I, hj Proposition 
US] and a rotation of coordinates we can assume that Q{yi = 0) = Q{{0}). 
We claim that then the functions fij for j = 2,...,d are linearly independent 
mod constants with respect to Q. Suppose that for some real 02, ...,0^ not all 
and constant c, yiz{y)/{v + \y\'^) = c a.s. Q where z{y) := X]j=2'^j%- Since 
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/ yiVjdQiy) I + lyp) = for j > 2 we must have c = and so 

1 = Q{y^z{y) = Q) = Q{z{y)=Q) + Q{y,=Q^z{y)) 

but the latter probabihty is by choice of yi. Thus Q{z{y) = 0) = 1 but 
{z{y) = 0} is a ((i — l)-dimensional vector subspace, contradicting Q G Ud^y+d- 
Thus the rank is always at least d — 1 for d > 2, which is sharp by the example. 

Now \/n{Ai,{Qn) — A^{Q)) has the same asymptotic normal distribution as \/n 
times the expression in ( 163|) since the other term in ( l62l) yields ^/nOp{l/^/n) = 
Op{l). So part (a) is proved. 

For (b), we take Q := P o T^^ e Ud+i,„+d and apply part (a) to it with d, u 
replaced byc/+l,z/' = z/ — 1. We can write Qn = Pn^T^^. As in part (a), we will 
have almost surely P„ G Vd,v+d for n large enough. From the resulting A^i, we get 
Hi, and for P and Pn via Proposition Hl^a) with 7 = 1. Then = {A,^')j^d+i 
for j = 1, d, both for P, Q and for P„, Q„. We also have for i,j = l,...,d, 

(64) iKiP)hj = iA,'iQ)h - iA,'iQ)kd+M^'iQ))j,d+u 

and likewise for P„ and Qn- This transformation of matrices, although nonlinear, 
is smooth enough to preserve asymptotic normality (the finite-dimensional delta- 
method), where the following will show how uniformity in the asymptotics is 
preserved: 

Lemma 21. // random vectors {^7jn}f=i for n = 1,2, . . . and a constant vector 
{Ui}f^i are such that as n 00, y/n{Uin — f^j}f=i converges in distribution to a 
normal distribution with mean on R'^, then so does 

(65) V^({f/^„ - U,}tl, {UinU.n - UiU,}^<i<,<d) 

on Por a family of {Uin] CLfid {Ui} such that Ui are uniformly bounded 

and the convergence to normality of v^({^m — f^j}f=i) holds uniformly over the 
family, it does also for / fg3]) . 

Proof. For one product term, we have 

U,nU,n - UU, = {Un ' Ui)U, + Ui{U,n " Uj) + (f/„, - Ui){U,n " U,) 

where the last term is Op{l/n) and so negligible and the other terms are jointly 
asymptotically normal. The uniformity holds for the first two terms since the Ui 
are uniformly bounded. Each factor in the last term is uniformly Op{l/ ^/n), so 
their product is uniformly Op{l/n). □ 

Returning to the proof of Theorem [201(b). Lemma [21] for Uin '■= A^'{Qn)i,d+i 
and constants Ui := A^i{Q)id+i gives asymptotic normality of ^/n[T.,^(Pn) — 
^u{P)h using (El. 

Via an affine transformation of M'^, we can assume that fiu{P) = and ^u{P) = 
Id- Then for Q = P o T'^ we get A^/{Q) = Id+i- If for some ai, ...,ad not all 
we have ^^=1 ajyjyd+il + ll/P) = c a.s. {Q) for a constant c, we must have 
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c = and thus Yl'j=i '^jUjUd+i = Xlj=i '^jVi — ^-S- fo^' where the latter 
equation also holds a.s. (P), contradicting P G Vd,y+d. Thus the asymptotic 
normal distribution for /i,^(P„) has full rank d. The rank of the covariance of the 
asymptotic normal distribution for Sj,(P„) behaves as in part (a) by the same 
proof. Part (b) of the theorem is proved. □ 

Now, here is a statement on uniformity as P and Q vary. Recall as defined 
in (iO]). 

Proposition 22. For any 5 > and M < oo, the rate of convergence to normal- 
ity in TheoremWUi a) is uniform over the set Q := Q{6, M, v) of all Q G lAdM+d 
such that A^{Q) E Ws and 

(66) Q{{y: \y\ > M}) < {1 - 6)/{u + d), 

or in part (b), over all P G Vd,u+d such that Sj,(P) G Ws and i \66\ ) holds for P 
in place of Q. 

Remark. The example after Lemma [8] shows that A = A,y{Q) itself does not 
control Q well enough to keep it away from the boundary of Ud^y+d or give an 
upper bound on the norm of which is needed for uniformity in the limit 

theorem. For a class Q of laws to have the uniform asymptotic normality of A^, 
uniform tightness is not necessary, but a special case of uniform tightness is 
assumed. 

Proof. A transformation as in the proof of Lemma[8]gives a law q with Au{q) = Id 
such that (!66|) holds with Q replaced by q and M by K := M/\/6, noting that 
Ti < 1/6 where ri is the largest eigenvalue of Ai^{Q)~^. 

In the proof of Theorem [20| it was shown that for any 6 > and k = 1, 2, 
r^"^^'*^ is a uniformly bounded VC major class of functions with sufficient mea- 

surability properties for empirical process limit theorems. To show that T^"^^'*^ is 
a uniform Donsker class in the sense defined and characterized by Gine and Zinn 
(1991), one can apply a convex hull property proved by Bousquet, Koltchinskii 
and Panchenko (2002). 

Take any A E Sd with ||A||i? = 1. In the following, probabilities and expecta- 
tions are with respect to q. Let X := {z'A'^z)/{iy + z'z). Then < X < 1 for all 
z and by <^ with Q = g and P = /, EX = trace(A2)/(i/ + rf) = l/{v + d). Thus 

1 <^ + P.Yx> ' 



ly + d ~ 2{iy + d) \ 2(z/ + d) 



soPr(X >(5/[2(z/+rf)]) > (l-|)/(z/+c/). LetV := {X > 6/[2{u+d)], \z\ < K}. 
Then by §k> for q and K we have Pi{V) > 6/[2{u + d)] > 0. Let S := 
z'z/{u + z'z), Y := Xlv and Z := Xlyc Then 

E{XS) = E{{Y + Z)S) <EZ + E{YKy{u + K^)). 
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WehaveE(Fz//(z/ + K2)) >a/(z/ + rf) wherea := 6^iy/[A{u + d){iy + K^)]. Thus 
{v + d)E{XS) = (u + d) \ J dq{z) <l-a. 

J [v + z'zy 

This implies, by the proof of Lemma [8|, that the eigenvalues of the Hessian Tij 
for qB. at / are all at least a and those of the Hessian I-La for QH at A are 
at least a' := S^a. Here a' depends on 5, M, z/, and rf, but not otherwise 
on Q G Q. Bounds in the proof of Theorem [2D] hold uniformly: specifically, in 
fl63|) . 11^-^ < 4/(52^) and the entries A^)^^ G T^^^'" , a uniform Donsker 

class. The remainder term \Jnoi\Qn — Q||5,a:+2,i^) in (1621) is Op(l) uniformly over 
Q by (|6T1) since each r^"]^^'*^ is a uniform Donsker class. It follows that asymptotic 
normality of ^Jn{DAy){Q,n — Q) holds uniformly for Q E Q. 

It remains to show that Pr(Q„ G Ud^y^d)i the probability that y4,^(Q„) is defined, 
converges to 1 as n — *■ oo at a rate uniform over Q G Q. The class of all vector 
subspaces of R'^ is a VC class of sets with suitable measurability, so it is a uniform 
Glivenko-Cantelli class by Dudley, Gine and Zinn (1991, Theorem 6). For q = 
0, 1, d — 1, let J{q) be the class of all g-dimensional vector subspaces of M'^. We 
need to show that for each q, 

(67) sup Q{H) < 1 - 

We can restrict to Q with Ay{Q) = 1^ without changing the suprema of Q of 
subspaces, replacing again Mhy K := M/v^. Then we can fix G J{q) and 
let Q vary. Let := z'^j^^ + ■ ■ ■ + z"^. By choice of coordinates we can take 
H = {z : \z\'^^ = 0}. For each Q E Q, since Ai^^Q) is defined, we have Q{H^) > 
{d-q)/{iy + d) > l/{iy + d). We also have by ([MD Q{\z\ > M) < {l-6)/{u + d), 
so Q{H^n{\z\ < M}) > 5/{u + d). Now 

d-q f \z\ldQ ^ 6 ^ 



u + d Ju + z'z-u + d'u + M^ 

5iy 



{u + d){u + M^)' 
It follows that, replacing M by to allow for the transformation, 

mm < ^ - 

u + d iu + d)iu + K^y 

which implies (1671) and so finishes the proof of part (a). 

As part of the proof of part (b), the next fact will show that the special- 
case tightness hypothesis (166!) itself implies a bound on ||y4,^(Q)|| (although not, 
of course, on ||Aj^((5)^^||). A bound exists since A^, has a breakdown point of 
l/(i/ + d) with regard to mass going to infinity [Tyler (1986, §3); Diimbgen 
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and Tyler (2005, Theorem 5 and its proof)]. The next lemma provides specific 
constants which may not be sharp. 

Lemma 23. If Q eUd,u+d, ^) implies \\A^{Q)\\ < M^{v + d - 5) / {5u). 

Proof. Ay{Q) G Vd exists by Theorem [6]^a). Take coordinates in which A := 
Ai^{Q) is diagonalized with eigenvalues l/xj, i = l,...,d. We then have by (IHl) 
and Uy{s) = + d) / {u + s) (just after (133|) ) that 

i = (. + d)^ x.^dQ(x) 



for i = 1, . . . ,d. The integral over {|x| > M} is at most (1 — 5)/[(z/ + d)Ti\ by 
( 166|) . For \x\ < M we have 

< — ^ < 



Thus 6/Ti < (z/ + d)My{iy + TiM^), n > 6u/[M^{iy + d - 6)] for all i, and the 
lemma follows. □ 

Now to prove Proposition [22] part (b), i.e. as it relates to Theorem [20](b). let 
V be the class of laws satisfying the hypotheses. For P G P, let Q := P o Tf^ 
as usual. Then (166!) holds for Q with M + 1 in place of M. By Proposition [5l 
since z/ > 1 in part (b), Q G Ud+i^u+d- By Lemma [23l ||A,^/((5)|| are bounded 
uniformly for P G P (recall z/' = z/'- 1 > 0). Next, detS^(P) = detA^/(Q) by 
( !28ll with 7 = 1, which holds by Theorem [6](b). This determinant is bounded 
below by ||S~^(P)||~'^ > 5'', so the smallest eigenvalue of A,,i{Q) is bounded below 
hj 5'^\\A^i{Q)\\-'^, and ||v4;/((5)|| < \\A^i{Q)\\'^ / 5'^, which is bounded uniformly for 
P G P. 

Thus all the hypotheses of part (a) hold for c? + 1, z/ — 1 in place of d, and 
some 5' > in place of 5, depending on Q and P only insofar as the hypotheses of 
part (b) hold, so part (a) gives uniform asymptotic normality of \/n[A,ji{Qn) — 
Ayi[Q)) over all P G P. Taking the last column, that directly gives uniform 
asymptotic normality of ^/n{^y{Pn) — ^^{P)). For ^/n{T,v{Pn) — ^v{P)) one 
can apply the delta-method for products. Lemma [21], which works uniformly for 
\lXy{P) \ bounded, as they are, so Proposition [22] is proved. □ 



8. Norms based on classes of sets 

Suppose II '111 and || ||2 are two norms on a vector space V such that for some 
K < GO, \\x\\2 < -f^||a;||i for all x & V. Let U (Z V he open for ||-||2 and so also 
for II 111. Let V & U and suppose a functional T from U into some other normed 
space is Frechet differentiable at v for ||'||2. Then the same holds for || ||i since 
the identity from V to is a bounded linear operator from {V, to (V, IIH2) 
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and so equals its own Frechet derivative everywhere on V, and we can apply a 
chain rule, e.g. Dieudonne [1960, (8.12.10)]. 

If is a class of bounded real-valued functions on a set measurable for 
a (T-algebra A of subsets of X; and is a finite signed measure on A, (e.g. 



P„ - P) let 11011^ := snpf^^l J fd(j)\. For C C ^ let ||0||c := H\\g where 
g := {Ic: CeC}. 



Let be a VC major class of functions for £ (defined in Dudley [1999, pp. 
159-160]), where S C A and suppose for some M < oo, \f{x)\ < M for all / G 
and X G X. Then for any finite signed measure on ^ having total mass 0(x) = 
(e.g., = P — Q for any two laws P and Q), we have 



by the rescaling / i— ^ (/ + M) / (2M) to get functions with values in [0, 1] and then 
a convex hull representation [Dudley (1987, Theorem 2.1(a)) or (1999, Theorem 
4.7.1(b))]; additive constants make no difference since 0(x) = 0. 

As noted in the proof of Theorem [2D], each F^"^^'"^ is a uniformly bounded VC 
major class for the VC class £{2k + 4, d) of sets (positivity sets of polynomials of 
degree < 2A; -|- 4). So by fIDT]) and fIDS]) . for some M < oo depending on r, 6, z/, 
and d, we have 



discussion: 

Corollary 24. For each d = 1,2, and u > 1, the Frechet differentiability 
property of the ty location and scatter functionals at each P in Vd,v+d, cls shown 
in Theorem [TR with respect to \[\\5^k+2,u, also holds with respect to ||'||£-(2A:+4,d)- 

Each class S{r,d) for r = 1,2,... is invariant under all non-singular affine 
transformations of M*^, and hence so is the norm || ||£:(r-,d). Davies (1993, pp. 1851- 
1852) defines norms ||'||£ based on suitable VC classes C of subsets of R'^ and 
points out Donsker and affine invariance properties. The norms || ||5,r,!/ are not 
affinely invariant. 

On the other hand, note that M in (!69l) depends on 6, and there is no corre- 
sponding inequality in the opposite direction. Thus, Frechet differentiability is 
strictly stronger for || ||5,fc+2,!y than it is for ||-||£-(2fc+4,(i)- 



In dimension d = 1, the scatter matrix S reduces to a number o"^. The p and 
h functions in this case become, for 9 := (/i, a) with a > 0, by (|3T|) and (l32l) . 



(68) 



11011^ <2M||0||^ 




< 2M||0||^(2fc+4,d) 

.'^ with 0(m'^) = 0. We have by the preceding 



9. The one-dimensional case 



(70) 
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71 K{x,e) := log a + log ^\ , ^ . 

The function /i,^ is bounded uniformly in x and for |yu| bounded and a bounded 
away from and oo. Thus it is integrable for any probability distribution P on R. 
Let Phjj{9) := j h^[x,6)dP{x). In the next theorem, extended M-functionals 
are defined by ([I]) with 9 := (/i, ct) G 6 = R x (0, oo) and O = M x [0, oo). 

Theorem 25. Let d = 1 and 1 < u < oo. Then: 

(a) For any law Q onR satisfying 

(72) maxg({t}) < z//(z/ + l), 

the M-functional (/U, cr) = {fiu,cru)iQ) exists with a^iQ) > and is the unique 
critical point with dQhy/dii = dQhyjda = 0. On the set of laws satisfying 
[Tm , {flu, cr^) is analytic with respect to the dual-bounded-Lipschitz norm and 
thus weakly continuous. 

(b) For any law Q on R, the extended M-functional 9q{Q) := {fiu,au){Q) G 
exists for from ( [y7| j. 

(c) If Q{{s}) >i'/{i' + 1) for some (unique) s, then fiuiQ) = s and a^iQ) = 0. 

(d) The map Q 9o{Q) is weakly continuous at every law Q. For Xi,X2, . . . 
i.i.d. (Q) and empirical measures Qn := n~^YTj=i^Xj, we thus have maximum 

likelihood estimates 9n = 9o{Qn) existing for all n and converging to 9q{Q) almost 
surely. 

Remark. The theorem doesn't extend to < z/ < 1. For some Q, points s in part 
(c) are not unique. For example if z/ = 1 (the Cauchy case) and Q = |(5-i + Si), 
the likelihood is maximized on the semicircle /i^ + cx^ = 1, as Copas (1975) noted. 

Proof. Part (a) holds by the case of general dimension, Theorem [9](d) , since 
a"^ \—>- a is analytic for a > 0. The other parts are special to ci = 1. 

Let D := [x — /j,)^ + ua^. Let z/ > 1 be fixed for the present and let p = py 
and h = hy. It's immediate from (1701) and (I7T|) that for any 9 = {p, a) with 

< cr < oo and any x G M, 

dh{x,9) ^ dp{x, 9) ^ (z/ + l)(/i-x) 
^ ' dp dp D ' 

{v + l){x - pf 
D 

It's easily seen that for any K > Q and all real y, 
(75) \y\/{K + y^) < 1/{2Vk). 

It follows directly that for any x and p, any cr > and any u > 1, both partial 
derivatives (|73l) and (|74l) each have absolute values < z//a, so for any 6 > 0, they 



(74) 



dh{x,9) dp{x,9) 



da 



da 



a 
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are bounded uniformly for a > 5. For 9 = (0, 1) we have h{x, 9) = 0. Thus for 
any and < a < oo, 

(76) \h{x,9)\ < z/(|loga| + |/i|/a), 

so h is bounded uniformly for fi bounded and S < a < 1/6. 
From we see that dQh{9) /da = if and only if 

(77) F(/x,a) := / j""'^^' ,, dQ{x) ^ 



z/cr^ + (s — yu)^ + 1 

As a decreases from +oo down to 0, the integrand increases from up to l^^/i, 
strictly for x ^ \i. Thus the integral increases from up to Q{Jy\x\^\ strictly unless 
Q({/i}) = 1. So fl77|) for a fixed /i has a solution a := a{fi) > (depending on 
u and Q) if and only if Q{{fiY) > l/(z/ + 1), and the solution is unique. Then, 
moreover, dQh{9)/dcr will be < for < cr < a{fi) and > for a > o-(/i), so 
that Qh{fi, cr) has its unique minimum for the given fi at a = cr(/i). 

If <5({/x}) > z//(z/ + 1), then a{fi) is set equal to (e.g. Copas [1975]), which is 
natural since for the given /x, Qh{fi, a) has its smallest values as a | 0. 

Taking second partial derivatives we get 

(78) d^h/dfi^ = {u + l)[iya^ - {x - i2y]D-\ 



(79) d^h/dadfi = 2{u + l)ua{x - fi)D 



-2 



^0) 



d^h _ 1 



> + l)i^i^-l 



+ 2(. + l).("-^)^ 



It's easily seen that these second partials are also bounded uniformly for cr > 5 
for any S > 0. 

The following shows that cr(-) is and strictly positive except possibly at 
one large atom. (Here suffices for present purposes; it could be improved to 
analyticity, as in the proof of Theorem [n](c).) 

Lemma 26. On the set U := U^^q of ii for which (^({/i}) < f/if + 1), namely 
the whole line if ( [7^ holds or the complement of a point if it fails, the function 
fi 1-^ o"(/i) > is , as is the function /i t— > Qh{fi,(7{fi)). 

Proof. For each E U, we have cr(/i) > 0, where cr(/i) is defined after flTTl) as 
the unique solution of F{^,a) = l/(z/ + 1) for each n E U. By f l79l) . flHOj) . and 
dominated convergence, F is C^. We have 



dF{fi,a)/da = -2va J {x - iJ,yD-^dQ{x) < 

for all n E U and all cr > 0. It follows from the implicit function theorem (e.g. 
Rudin (1976, Theorem 9.28) that cr(-) is a function on U. Also, the function 
{jj,, cr) 1-^ Qh{fi, cr) is for cr > by (173|) and (!74|) and their integrated versions. 
Thus /i I— > Qh{fi, cr{fi)) is on [/, proving the lemma. □ 
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The following fact for laws concentrated in two points will be helpful, also in 
the Remark showing that is non-Lipschitz at the end of this section. 

Lemma 27. Let u > 1 and Q = q6a + pSb where a < b and 0<p=l — q<l. 
(a) // l/(z/ + 1) < p < V I {y + 1), then Qh^, has a unique critical point (/ip, cXp), 
with (jp > 0, at which the Hessian of Qh is strictly positive definite. Explicitly, 



up - q 2 ('^ + ^)^l^p ~ l^p ^^PQ - ^{P^ + (f) + 



a, 



z/- 1 ' P u 



(b) Ifp< l/(z/+l) or p > u/ (u+l) then an M-functional {fi,cr) = {fii,{Q),au{Q)) 
exists with a,,{Q) = and fJ^uiQ) = a or b respectively. 

Proof. By an affine transformation we can assume that a = and 6 = 1. For 
part (a), the equation dQh/dfi = (1731) times I — fi, the equations dQh/da = 
(17^ . (177j) . and straightforward calculations give unique solutions ( IHTl) for a critical 
point. Then < /ip < 1 by the hypotheses on p. For each u > 1, da^/dp = only 
at p = 1/2 where = 1/4, a maximum. Also, 0"^ | strictly as p | l/(z/ + 1) or 
p t z//(z/ + 1). Thus (Jp > for l/(z/+ 1) < p < z//(z/ + 1) as assumed, and (/Xp, dp) 
is the unique critical point of Qh. 

By Theorem [6] and Lemma [H the Hessian of Qh as a function oi A & V2 at 
A = yl,^_i((5oTj"^) is positive definite. This remains true restricted to the subset 
where 7 = A22 = 1 in Proposition Hl^i), so that A = {"^^^ ^), since, in suitable 
coordinates, a principal minor of a positive definite matrix is positive definite. 
It follows that the Hessian of Qh with respect to (/i, a) at (/ip, dp) is positive 
definite. So part (a) of Lemma [27] is proved. 

Now for part (b), we can assume by symmetry that p < l/(z/ + 1) and want 
to prove /it, = cTj^ = are the M-functionals of Q. For all 7^ 0, by Lemma [261 
cr(/i) > is defined such that Qh{fi, a) is minimized for the given /i at a = : = 
(The notations cr^ and o"p are different.) Let {Qh){fi) := {Qh){fi,(7{fi)) 
for /i 7^ 0, a function of /i by Lemma [2B1 To show that d{Qh){fi)/dfi has the 
same sign as /i for /i 7^ is equivalent by (1731) and since dQh{fi, a) / da\a-=cT(p) = 0, 
to showing that for /i 7^ 0, 

(82) I > Q 

Z/(T2+/i2 ;ycx2 + (/i - 1)2 

By ([77]) we have for /i 7^ 

(83) "f^-^)' - ^ 



I^ffJ + + (1 _ + 1 

Combining, we want to show that (z/ + l)p{l — ji) < ucrf^ + (1 — fJ^Y for < p < 
l/i^ + 1). We need only consider < ;U < 1. If (!82l) fails, then for some such p 
and fi, {u + l)p(l — /i) — (1 — /x)^ > z/a^. Substituting in (1831) gives, where the 
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denominators are necessarily positive, 

{p + -fi)-l + 2fi u+1 - u + V 

so 

^ 1 



[(z/ + l)p-l](l-/i)+/i - Z/ + 1' 
but (z/ + — 1 < implies the left side is at least l—p> +1) > l/(z/+ 1) 
since z/ > 1, a contradiction. So fl82|) is proved. This implies that for any £ > 0, 

(84) inf{Q/i(/i) : < < e} < inf{Q/t(/i) : > e}. 

Next, if there is a sequence /ij — > such that o"(/ij) > 5 for some 6 > 0, then (1831) 
gives a contradiction for j large enough. So cr(/i) — as — 0. This implies 
that for any 7 > 

inf{Q/i(/i, a) : < 7, cr < 7} < inf {(^/^(/u) : 1/^1 < 7) > 7}, 

because by f l8^ . the inf is smallest for smallest, and then cr(/i) becomes < 7, 
so Q/i for a given and a > 7 is larger than at cr(/i). Also, by fl74|) . Qh{0, a) is 
strictly decreasing as a J,0. So part (b) of Lemma [271 is proved. □ 

Next, let's consider a general Q such that fl721) fails. The next fact, with part 
(a), implies parts (b) and (c) of Theorem] 



Lemma 28. Let u > 1 and let Q be a law on M such that for some u, Q{{u\) > 
u/lu+l). Then the ( extended) M-functional ofQ for or h^, exists with PuiQ) = 
u and ay{Q) = 0. 

Proof. Since > 1, m is uniquely determined. By a translation we can assume 
that u = 0. Then on the set U := {yU 7^ 0}, by Lemma [26| /i t-^ > is a 
function, giving the infimum of Qh{fi, a) for each p ^ 0. It will be shown that 

(85) iid{Qh){iJi,a^) / dp > Q for all 7^ 0. 

This is immediate ii Q = 5q from (173|) . so we can assume ioi (3 := Q({0}) that 
p / [p + 1) < 13 < \. By (1771) and Lemma we have for each /i 7^ that > 
and 

f {ii-xfdQ{x) _ 1 



'^^^ + /^^ A^o '^^^ + (/i - a;)2 z/ + 1 
To prove (l85l) . we need to show by fl73|) that for /i 7^ 
(87) /^/^^ ^ ^ /" (/X - x)c?g(x) ^ Q 

Combining (l87|l with (186|) . we need to show that for /i 7^ 0, 
^gg^ /" x(x - p)dQ{x) ^ 1 
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By (EE]), for /i ^ 0, 

I {fi-xfdQ{x) _ 1 I3ii'^ 



x^o + (/i - a;)2 z/ + 1 pal + //^ ' 
Now (155]) will follow from (1891) and the Cauchy-Schwarz inequality if 



By ( 186]) again, (z/ + < z^o"^ + /i^ unless Q is concentrated at the two points 

0,/x. That case is treated by Lemma [27r b). so we can neglect it here. Then the 
denominator of the last expression displayed is positive. Since {y + I)/? > 1 and 
Q{x 7^ 0) < + 1), it will suffice to show that for all real and as always, 



The fraction on the left goes to 1 as x ^ ±oo, and there the inequality holds. 
At x = 0, a minimum of that fraction, the inequality also holds. Setting the 
derivative of the fraction equal to gives one other root, where x = n + {uaf^/ fi) 
and where the inequality holds (with equality just for this one value of x). Thus 
(155]) and (|5^ are proved. 

The proof that fJ^uiQ) = o'uiQ) = is now completed as in the end of the proof 
of Lemma 1271(b). where now if fij — > and cr^fJ^j) > S > 0, (15B]) is contradicted 
for j large enough. So Lemma [28] is proved. □ 

It remains to prove part (d) of Theorem [251 To show the weak continuity of 
III, and at a law Q with Q{{t}) > z//(z/ + 1) for some unique t, we can and 
do assume that t = 0. We want to show that if a sequence Pk ^ Q weakly, then 
yWfc '■= fJ'u{Pk) and := a^{Pk) 0. Taking subsequences, we can assume 
that yUfc fiQ and ak — >■ (Jq where —oo < /io < +oo and < ao < +oo. 

If (jfc = for all k then we have Pk{{tk}) > ^/{^ ~^ 1) fo^^ some tfc. By weak 
convergence, we must have t^. — > 0, and /i^ = by Lemma [25] so the conclusion 
holds. Thus we can assume from here on that > for all A; > 1, taking another 
subsequence. For = 0, 1, 2, . . ., let 

(/ifc - xy 



hix) 



ual + (^A 



x] 



with /o(a;) := 1 if ctq = 0. Then < Ik{x) < 1 for all x and k, a domination 
condition which is used below without further mention. For k > 1, since > 0, 
we have by ([77]) and Lemma [281 that 

(90) [ hdPk = l/(z/+l). 
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If (Jo = +00 and /io is finite, then as A; 00, J^. uniformly on compact 
sets. Since Pk are uniformly tight, it follows that J IkdPk 0, contradicting 
(!90|) . If /io = ±C)0 and do is finite, then — 1 uniformly on compact sets, so 
/ IkdPk 1, again contradicting (pOj) . 

So we have two remaining situations, /io and cto both finite or both infinite. 
First suppose both are finite. If cxo > then as 00, Ik{x) Io{x) uni- 
formly on compact sets. From this, the weak convergence and (!90!) it follows that 
/ Io{x)dQ{x) = l/(z/ + 1), so (Tq = cr(/io) for Q. For = 1, 2, . . . let 

Jk[x) := — 2~r7 A? ^ M^) ■ 



z/a^ + {fik - xy ' ual + (/io 



i2 



uniformly on compact sets. Then |Jfc(a;)| < l/{2^ak) for all x by ( |75ll . so Jfc 
are uniformly bounded for large enough or for A; = 0. By Lemma [251 cr^ > 
implies that each Pk satisfies ( 1721) . Then by Theorem [25t^a) as already proved, 
(A*fc; CTfc) is a critical point for P^, and so by fl73l) J JkdPk = for all /c > 1. Then 
by weak convergence, / JorfQ = 0. Thus (/io, cto) would be a critical point for Q. 
This implies by (1851) that /io = 0, but that contradicts / Io{x)dQ{x) = l/(z/+ 1). 
So /io finite and cxo > are not compatible. 

If /io is finite and non-zero and cxo = then we have Ik{x) 1 except possibly 
for X = /io, and the convergence is uniform on compact subsets of {fioY- Thus 

liminf / IkdPk > QiM") > = > 



k^oo J U+lu+lu+l 

again contradicting fl90|) . 

So the proof is complete except if /io = ±C)0 and ctq = +00. Then by symmetry 
we can assume that /io = +00. 

If (Tfc = o(/ifc) as A; — > cxD then — > 1, or if fik = o(crfc) as A; — 00 then Ik — > 0, 
in either case uniformly on compact sets and so contradicting fl90|) . So, taking 
another subsequence, we can assume that as A;— > 00, fik/o'k ^ c for some c with 
< c < 00. Then uniformly on bounded intervals, Ik — * c^/i^^ + c^) as A; 00, 
an increasing function of c, so (MJ\i implies that c = 1. 

Since Pk are uniformly tight, take a constant M < 00, with M > 1, large 
enough so that Pk{\x\ > M) < l/(2(z/ + 1)) for all k. On [-M, M], the quantity 
jfc(a;) := 3{x, /i, 0", i/) in parentheses in flTTj) whose logarithm is taken, for /i = /i^ 
and 0" = cTfc, satisfies asymptotically 

■ ( \ ^^ + 1 > ^+1 > ^ 

Thus up to an additive constant going to as A; ^ 00, 
(91) 

'v + l 1 
~2 4 



log jfc (a;) c?Pfc (a;) > 

M 



21ogM) = - ( Z/ + - ) logM. 
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Now if k is large enough, 0"^ > 1 and 6z/cr^ > + 2u. Then 
for all X, by a short calculation. Thus jk{x) > l/(3cr|) and 

"+7 ■ 1 

'|a;|>M 



/ \og jk{x)dPk{x) > ^ (-2 log cTfc - log 3). 

J\x\>M 4 



Combining this with flQlj) and by ([7T|) it follows for a constant a that as /c — »• oo, 
Pkh{fik,o'k) > (logo"fc)/2 — a — >• +CXO. But since Pkh{0, 1) = 0, this contradicts 
the assumption that (fXkyO'k) give the M-functional of Pk and so completes the 
proof of continuity of {fii^,au) for weak convergence. Since Qn Q weakly a.s. 
for the empirical measures Qn of Q (by the Glivenko-Cantelli and Helly-Bray 
theorems), part (d) and Theorem [H] are proved. □ 

Remark. For u > 1, although (/i,^, aiy) is defined and weakly continuous at all 
laws, it is not Lipschitz at some boundary points (for any norm): in Lemma [271 let 
Qs ■■= QeSo+PeSi where p := Pe ■= (z/-e:)/(i/+l) and g := := (l+£)/(z/+l), 
£ > 0. In (ISTj) we find that =£:/(// — 1) + O(e^) as e | 0. Let {|'|| be any norm 
defined on finite signed measures on M, of which \\'\\bl is just one example. Then 

(92) ||g,-goll = ^l|5i-5oll/(^^ + i), 

(93) \a,{Q,) - a,{Q^)\ = a,{Q,) ~ ^e/{v-l) 

as £ J, 0. Thus Q ^ CTuiQ) is not Lipschitz and hence not Frechet differentiable at 
Qq with respect to the norm ||'||, whatever it may be. Also, al is not differentiable 
at Qo since da'l{Q^)/de has left limit and right limit l/(z/ — 1) > at e = 0. 



10. Appendix 

Derivatives in Banach spaces. Frechet differentiability is often defined by statis- 
ticians, e.g. Ruber (1981, §2.5), for functionals defined on the convex set of prob- 
ability measures. As long as the definition is for a norm, this usually seems to 
cause no problems. But, in this paper, we need to apply implicit function the- 
orems which require that a function(al) be defined on an open set in a Banach 
space. Thus we need the set U in the following usual mathematicians' definition 
of Frechet differentiability to be open. No set of probability measures is open in 
any Banach space of signed measures. 

Let X and Y be Banach spaces over the real numbers. Let -B(X, Y) be the 
space of bounded, i.e. continuous, linear operators A from X into F, with the 
norm \\A\\ := sup{||Ax|| : = 1}. Let U be an open subset of X, x G U, 
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and / a function from U into Y . Then / is said to be Frechet dijferentiable at x 
iff tfiere is an A e B{X, Y) such that 



as It — > X. If so let {Df){x) := A. Then / is said to be on [/ if it is 
Frechet differentiablc at each x E U and x i— >■ Df{x) is continuous from U into 
y). Iterating the definition, the second derivative D'^f{x) = D{Df){x), if 
it exists for a given x, is in B{X, B{X, Y)), and the kth derivative D'^f[x) will be 
in B{X, B{X, B{X, Y)) . . .) with k B's. Then / is called on U if its A;th 
derivative exists and is continuous onU. If / is C*^ on [/ for all A; = 1, 2, . . ., it is 
called C°° on U. In some cases, higher order derivatives will be seen to simplify 
or to reduce to more familiar notions. 

Suppose X is a finite-dimensional space M.'^. Let ei, . . . , be the standard basis 
vectors of R''. li x e U, an open set in K'', and / : C/ — > y, partial derivatives are 
defined by df{x)/dxj := limt^oif + tej) the usual definition except 

that the functions are F-valued. Just as for real-valued functions, / is from 
U into Y if and only if each df /dxj for j = 1, . . . ,d exists and is continuous from 
U into y, e.g. by Dieudonne [1960, (8.9.1)] and induction on d. Any hnear map 
A from R*^ into Y is automatically continuous and is given by A{x) = Yl'j=i ^j^j 
for some Aj e Y, so we can identify A with {Aj}j^j^ e y'. Then if Df{x) exists, 
each df{x)/dxj exists and Df{x) — {df{x)/dxjY^^^. 

Again as for real- valued functions, we can define higher-order partial derivatives 
if they exist. Then, / is from [/ C M"* into Y if and only if each partial 
derivative DPf{x) := d^^f/dx{'^...dx^'^, with p := {pi,---,Pd) and [p] := 
Pi + ■ ■ ■ + Pd < k, exists and is continuous from U into Y, e.g. by Dieudonne 
[1960, (8.9.1), (8.12.8)] and induction. 

If y = is also finite-dimensional, we have f{u) = {/i('u)}i=i for some : 
U — > R, i = 1, . . . , m, and df{x)/dxj = {dfi{x)/dxj}l'l^ for each j = 1, . . . , d, if 
either the partial derivative on the left, or each one on the right, exists: Dieudonne 
[1960, (8.12.6)]. 

Let X and Y be real vector spaces. For A; > 1, a mapping T : {xi, . . . , Xk) ^ 
T{xi, . . . , Xk) from X'' into Y is called k-lineariS for each j = 1, . . . , k, T is linear 
in Xj if Xi for i ^ j are fixed. T is called symmetric iff for each tt & Sf., the set of 
all permutations of {1, A;}, we have T(a;^(i), a;^(fc)) = T(xi, x^). Any 
/c-linear mapping T has a symmetrization Tg, which is symmetric, with 



A function g from X into Y is called a k-homogeneous polynomial iff for some 
/c-linear T : X^ —>■ Y, we have g{x) = grix) := T{x, x, . . . ,x) for all x E X. 
Since gx^ = gx one can assume that T is symmetric. For the following, one can 
obtain T from g by the "polarization identity," e.g. Chae (1985), Theorem 4.6. 



f{x) + A{u-x) + o{\\u-x\\) 
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Proposition 29. For any two real vector spaces X and Y and k = 1,2, ... , there 
is a 1-1 correspondence between symmetric k-linear mappings T from X^ into Y 
and k-homogeneous polynomials g — gr from X into Y . 

Now suppose {X, || ||) and (Y, | • |) are normed vector spaces. It is known and 
not hard to show that a /c-hnear mapping T from X^ into Y is jointly continuous 
if and only if 

||T|| := sup{|r(xi, . . . : ||a;i|| = ■ ■ ■ = ||a;fc|| = 1} < oo, 

and that a /c- homogeneous polynomial g from X into Y is continuous if and only 
if \\g\\ := sup{|gf(a;)| : = 1} < oo. In general, for a symmetric /c-linear T with 
||T|| < oo we have ||^t|| < < k'^WgrW/kl, e.g. Chae (1985), Theorem 4.13. 
The bounds are sharp in general Banach spaces [Kopec and Musielak (1956)] but 
if X is a Hilbert space we have \\gT\\ = ||^|| [Bochnak and Siciak (1971)]. 

If / is a function from an open set [/ d X into Y then at each x G U, 
D^f{x) defines a /c-linear mapping d^f{x) from X^ into Y , 

(94) d''f{x){x,,...,Xk) := (■■■((DV)(a:)(xi))(x2)---(xfc)). 

Then d^f{x) is symmetric, e.g. Chae (1985), Theorem 7.9. The corresponding 
/c-homogeneous polynomial u i— > gdkf(x){u) will be written as u ^ d'^ f{x)u®^. 

Also, / will be called analytic from U into Y iff it is C°° and for each x E U 
there exist an r > and fc-homogeneous polynomials Vk from X into Y for each 
k > 1 such that for any v E X with — x\\ < r, we have v E U and 

oo 

(95) f(v) = f(x) + J2yk{v-x). 

k=l 

It is known that then necessarily for each A; > 1 and u & X 

(96) Vk{u) = d''f{x)u^'/k\. 

For any Banach space X let (X', || ■ ||') be the dual Banach space B{X,R). The 
product X' X X with coordinatewise operations is a vector space and a Banach 
space with the norm ||(0, a;)|| := \\4>\\' + ||a;||. The mapping 7 : (0, x) (f){x) 
is C°° from X' x X into R (it is analytic and a 2-homogeneous polynomial): for 
■0, e X' and x,y & X we have 

7(V',y) = '0(y) = (l>{x) + ('ijj-(l))(x) + (l)(y-x) + ('ijj-(l))(y-x). 

As y) —>■ {(f), x), clearly {if) — 4)){x) and (f){y — x) give first derivative terms and 
(■0 — (j)){y — x) a second derivative term. We have that D'-f is continuous (linear) 
and has a fixed value [r], u) (((, v) r]{v) + (("")) in B{X' x X, B{X' x 
X,R)), so = 0. 

If U is an open subset of a Banach space Y and / is a C*^ function from U into 
X, then 

(97) {4>,u)^4>{f{u)) 
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is on X' X f/ by a chain rule, e.g. Dieudonne [1960, (8.12.10)]. 

For a point a; in a normed space (X, || ■ || ) denote the open ball of radius r around 
X by Br{x) := {?/ G X : ||?/ — x|| < r}. The Hildebrandt-Graves implicit function 
theorem and related facts, essentially as stated by Deimling (1985, Theorem 15.1 
p. 148, Corollary 15.1 p. 150, and Theorem 15.3 p. 151) are as follows: 

Theorem 30. Let X, Y, Z be real Banach spaces, U G X and V G Y neigh- 
borhoods of Xq and Uq respectively. Let F : U x V ^ Z be jointly continuous, 
and continuously differentiable with respect to y G V. Let F2 be the (partial 
Frechet) derivative of F with respect to y eV , so that for each x E U and y E V, 
F2{x, y){-) is a bounded linear operator from Y into Z . Suppose that F{xo, yo) = 
and that F2{xo,yo){-) is onto Z and has a bounded inverse, i.e. it is a topological 
isomorphism of Y onto Z. Then there exist r > 0, 5 > with Br{xQ) C U and 
Bs{yo) C V such that there is exactly one map T from Br{xQ) into Bs{yo) with 
F{x,T{x)) = for all x G B^ixo), and: 

(a) T is continuous. 

(b) If for some m > 1, F G C'^{U x V), then for some p with < p < r, T is 
C"" on Bp{xo). 

(c) If F is analytic on U x V then for some r with < t < r, T is analytic on 
Br{xo). 

The two Banach spaces Y and Z are topologically isomorphic if they are finite- 
dimensional and of the same dimension, e.g. both are R'^ or both are as in the 
present paper. Then we need that the linear transformation ^2(^0, |/o)(')) ^^e 
associated matrix of partial derivatives in coordinates, is non-singular. 
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